Dietary Determinants of Allergic, Immunologic, and Cardiometabolic Diseases

BMIN5030/EPID600 Final Project

Author

Stan Gabryszewski


1 Overview

This project used data from the nationally representative National Health and Nutrition Examination Survey (NHANES) to explore links between dietary determinants and allergic, immunologic, and cardiometabolic outcomes in U.S. adults, focusing on those between the ages of 20 and 40. Using survey-weighted univariable and multivariable regression analyses, I examined whether diet quality (primarily assessed via the Dietary Inflammatory Index) and various sociodemographic (e.g., age, sex, race/ethnicity, poverty-income ratio), behavioral (e.g., alcohol, smoking), and clinical (e.g., body mass index [BMI], blood pressure, hemoglobin A1c [HbA1c], vitamin D level) features were associated with self-reported asthma, allergic rhinitis, inflammatory arthritis, eczema, food allergy, diabetes, and hypertension. In addition, I used exploratory machine learning models to identify predictors that may contribute to risk across these diseases.

I discussed this project with Dr. Robert Grundmeier (Professor of Pediatrics at CHOP and Penn, Department of Biomedical and Health Informatics) and Dr. Rich Tsui (Professor of Anesthesiology and Critical Care at CHOP and Penn, Department of Anesthesiology and Critical Care Medicine). With Dr. Grundmeier, I discussed cohort definitions, application of survey weights, and approach for regression analyses. With both Dr. Grundmeier and Dr. Tsui, I discussed the approach for machine learning analyses.

GitHub repository link: Final Project Repo

2 Introduction

Over the past several decades, rates of allergic conditions – including asthma, allergic rhinitis, food allergy, and eczema - have risen substantially worldwide, driven by mechanisms that remain only partially understood [1]. A concurrent increase has occurred in cardiometabolic diseases, such as obesity, hypertension, and diabetes [2]. Allergic, immunologic, and cardiometabolic conditions share comorbidity and, together, reflect immune dysregulation that occurs due to a complex interplay among genetic susceptibility, environmental exposures, and immune-metabolic pathways. There is growing recognition that dietary factors influence immune development beginning early in life and continuing into adulthood. Diet quality, nutrient intake patterns, and anthropometric factors such as adiposity influence inflammation, epithelial barrier function, and the gut microbiome, which in turn affect disease susceptibility [3,4]. While a modulatory role for diet has been been implicated in select allergic (e.g., asthma) and cardiometabolic (e.g., obesity) conditions [5,6], knowledge gaps remain about the potential modulatory role in other allergic and immunologic diseases. This knowledge could help guide how dietary approaches might be integrated into clinical care to prevent or manage these comorbidities.

NHANES is a continuous, nationally representative survey that integrates interviews, physical examinations, and laboratory assessments to characterize the health and nutritional status of the U.S. population. It provides detailed information on demographics, health behaviors, dietary recalls, biomarkers, anthropometrics, and clinical outcomes, making it well suited for population-level nutrition research [7]. The Dietary Inflammatory Index (DII) is an established, literature-derived score that quantifies the inflammatory potential of an individual’s diet based on intake of nutrients and food components known to influence systemic inflammation. To compute standardized DII values in NHANES, this study used the dietaryindexNDP package, which links NHANES dietary recall data with nutrient composition from the U.S. Department of Agriculture’s Food and Nutrient Database for Dietary Studies (FNDDS) and food group equivalents from the Food Patterns Equivalents Database (FPED) [8,9]. Leveraging these tools, I examined associations between dietary, sociodemographic, and clinical factors and a set of allergic-immunologic outcomes (asthma, allergic rhinitis, inflammatory arthritis, eczema, and food allergy) as well as cardiometabolic outcomes (diabetes and hypertension), focusing on adults aged 20-40 years. This age group represents a population that may be particularly amenable to dietary interventions aimed at disease prevention or management. In this study, I applied survey-weighted logistic regression along with exploratory machine learning models to assess whether dietary factors and related covariates are associated with risk across these various conditions that are broadly characterized by immune dysregulation.

This work is inherently multidisciplinary, bridging allergy–immunology, nutrition, epidemiology, and biomedical informatics. Informatics approaches were essential to this project, from extracting and integrating NHANES data to applying the survey design structure and implementing the regression and machine learning analyses. The integration of my training in allergy-immunology, clinical nutrition, epidemiology, statistics, and informatics uniquely equipped me to complete this project and interpret its findings from a cross-disciplinary perspective. This project was supported by mentorship from Drs. Robert Grundmeier and Rich Tsui, whose guidance was instrumental in developing cohort definitions, applying survey-weighted design, and shaping the analytic approach for both the regression and machine learning analyses.

3 Methods

Data source, cohort extraction, and data harmonization

Using NHANES, I performed a cross-sectional analysis focused on non-pregnant adults ages 20-40 years from 2005 to 2012. This age and time range were selected to target a younger adult population that may be more amenable to dietary interventions for prevention or management of chronic diseases, to avoid the increasing medical complexity associated with older age, and to mitigate missingness of key variables that were either relatively more missing or entirely absent in other NHANES cycles. I pooled four continuous two-year NHANES cycles (2005-2006, 2007-2008, 2009-2010, and 2011-2012). For the main analyses (asthma, allergic rhinitis, inflammatory arthritis, diabetes, hypertension), data from the full pooled period (2005–-2012) were used. Because eczema and food allergy were only measured in select cycles, I additionally constructed time-restricted subcohorts for eczema (2005-2006) and food allergy (2007-2010).

Covariates included sociodemographic variables (e.g., age, sex, race/ethnicity), clinical characteristics (e.g., BMI, waist-to-height ratio, vitamin D [25(OH)D] level), and self-reported allergic-immunologic conditions (asthma, allergic rhinitis, inflammatory arthritis [i.e., rheumatoid or psoriatic arthritis], eczema, food allergy) and cardiometabolic diseases (diabetes, hypertension). Two-year survey weights were rescaled to generate weights for analysis of pooled NHANES cycles. Data from multiple NHANES modules were merged into an analytic dataset, and variables were harmonized by recoding them into consistent categories across cycles. Using the dietaryindexNDP package, the Dietary Inflammatory Index (DII) was computed based on 24-hour dietary recall data and was represented as both as a continuous score (with and without alcohol included) and as quartiles (Q1–Q4), ranging from most anti-inflammatory to most pro-inflammatory, as previously described [10]. To reduce the influence of extreme values and to create clinically meaningful comparison groups, select variables (e.g., BMI, vitamin D levels) were converted into quartiles before modeling. I performed data manipulation with the tidyverse package collection, formatted outputs with broom and scales packages, generated figures using the ggplot2 package, and generated tables using the gt package.

All extracted variables, their types (i.e., continuous, categorical, binary), distributions, and missingness are summarized in Table 1. In general, for sociodemographic and behavioral variables, missingness was low, with most variables (e.g., age, sex, education, insurance, smoking, physical activity, family history) showing a missingness of less than 2%, while other variables (e.g. race/ethnicity, alcohol use, poverty-income ratio, food security) showed a range of missingness between 7% and 20%. For cardiometabolic and nutrient labs, the missingness rate was about 8% to 10%. The rate of missingness for primary clinical outcomes was quite low (less than 1%). For food allergy and eczema, the missingness (calculated relative to the full cohort) was expected given the unavailability of these clinical outcomes in many survey cycles. Hence time-restricted comparator cohorts were used for secondary analyses for these specific clinical outcomes. Subsequent regression and machine learning analyses used complete-case data.

Weighted prevalence analysis

Using the survey package, the survey-weighted prevalence of all allergic, immunologic, and cardiometabolic conditions was determined. In addition, to assess how prevalence varied by dietary inflammatory status, survey-weighted prevalence was determined within DII quartiles (Q1-Q4).

Univariable and multivariable regression analyses

Survey-weighted regression analyses were performed using the svyglm function in the survey package. After assessing models for overdispersion, quasibinomial survey-weighted logistic regression was employed. First, univariable analyses were performed to examine associations between each outcome and a core set of sociodemographic, behavioral, clinical, and dietary predictors. An additional exploratory set of univariable analyses evaluated associations between each outcome and individual DII nutrient components. For all univariable models, odds ratios (ORs) with 95% confidence intervals (CIs) were reported, and p values were adjusted using the Benjamini-Hochberg method to account for multiple comparisons.

Next, survey-weighted multivariable regression analyses were performed to evaluate adjusted associations between continuous DII and each disease outcome. Three models were constructed. Model 1 adjusted for age and sex. Model 2 additionally adjusted for race/ethnicity, insurance status, poverty-income ratio, and education level. Model 3 further adjusted for BMI category, physical activity, and smoking status. For each model, the OR, 95% CIs and p values were determined, and results were visualized using Forest plots.

Effect modification analyses

I assessed whether the association between continuous DII and each disease outcome differed by sex, BMI category, or vitamin D category using survey-weighted logistic regression models that included all Model 3 covariates. Sex contributed one interaction term (DII x sex), whereas the multi-category BMI and vitamin D variables contributed multiple interaction terms with DII. I determined stratum-specific ORs and 95% CIs for DII within each modifier level. Statistical significance of effect modification was evaluated using Wald chi-square tests combining all interaction coefficients for each modifier.

Machine learning analyses

I developed exploratory machine learning models to predict disease outcomes (asthma, allergic rhinitis, inflammatory arthritis, hypertension, diabetes, eczema, and food allergy) using the tidymodels set of packages for preprocessing, data sampling, model tuning, and performance evaluation. Two predictor sets were examined. Set 1 included core sociodemographic, behavioral, anthropometric, and clinical variables as well as continuous DII scores (23 variables in total). Set 2 included individual nutrient predictors (26 variables in total). Four algorithms were trained for each outcome, namely logistic regression (LR), LASSO-penalized logistic regression (glmnet package), Random Forest (RF; randomForest package), and XGBoost (XGB; xgboost package). Logistic regression was chosen as an interpretable baseline model, LASSO was chosen for its ability to shrink coefficients and perform variable selection, and RF and XGB were included to capture complex non-linear relationships.

For each outcome and predictor set, data were split into a training set (80%) and a test set (20%). Within the training set, 10-fold cross-validation was used for preprocessing and hyperparameter tuning. Preprocessing steps were implemented using tidymodels recipes. Hyperparameters for LASSO (penalty), RF (mtry), and XGB (number of trees, tree depth, learning rate) were tuned using the cross-validated area under the receiver operating characteristic curve (AUROC). Model performance on the test set was quantified using AUROC. To identify the topmost predictors, I computed model-specific variable importance scores using the vip package. For LR models, variable importance referred to the absolute value of standardized regression coefficients. For LASSO models, variable importance was based on the absolute value of the penalized coefficients after shrinkage. For RF models, variable importance was derived from tree-based split metrics (i.e., increase in node purity associated with each variable). Finally, for XGB, variable importance reflected the gain, i.e., how much each predictor improved the model’s accuracy when used in the boosted trees.

Code to install and load packages

# Install packages
install.packages("broom")
install.packages("ggplot2")
install.packages("glmnet")
install.packages("gt")
install.packages("nhanesA")
install.packages("randomForest")
install.packages("scales")
install.packages("survey")
install.packages("tidyverse")
install.packages("tidymodels")
install.packages("xgboost")
install.packages("vip")

# Install dietaryindex packages
devtools::install_github("zxucmu/dietaryindexNDP", ref = "dietaryindexNDP_1.0.0")
# Load packages
library(broom)
library(dietaryindexNDP)
library(ggplot2)
library(glmnet)
library(gt)
library(nhanesA)
library(randomForest)
library(scales)
library(survey)
library(tidymodels)
library(tidyverse)
library(xgboost)
library(vip)

Code to generate master analytic dataset and summarize variable types, distributions, and missingness

The code below was used to generate the master analytic dataset for subsequent regression and machine learning analyses. Briefly, custom helper functions were used with the nhanesA package to streamline downloading NHANES modules of interest (i.e., demographics, psychosocial variables, medical conditions, anthropometrics, laboratory results, dietary variables, etc.) across four survey cycles (2005–2012). These were helpful as I explored various iterations of surveys cycles and variables to optimize the cohort parameters that I ultimately selected for this project. The set of tidyverse packages was use to clean and merge these data. I then used the dietaryindexNDP package to compute Dietary Inflammatory Index (DII) scores (with and without alcohol) and individual nutrient components. All cleaned modules were joined by participant ID into a single dataset. In this dataset, I harmonized disease outcomes (i.e., asthma, allergic rhinitis, inflammatory arthritis, eczema, food allergy, diabetes, hypertension) and converted selected variables (e.g., BMI, vitamin D level) into clinically meaningful categories. The cohort was restricted to non-pregnant adults 20-40 years of age, and NHANES sampling weights were rescaled to account for pooling of survey cycles. Using the survey package, I then generated outcome-specific analytic subsets with corresponding MEC-weighted survey designs for subsequent regression analyses. Finally, I created a data dictionary-style table (Table 1) summarizing all variables, their types, sample sizes, value ranges, categorical levels, and missingness. Additional coding details are included as annotations within the code chunk below.

# Helper functions-----------------------------------------------------------------------------
## Helper function to load NHANES module across survey cycles, including options to restrict to select variables and provide status summaries
load_nhanes_module <- function(module, cycles_vec = cycles,
                               keep_vars = NULL,
                               quiet = TRUE) { # set to FALSE for debugging messages
  module_years <- paste0(module, "_", cycles_vec)
  
  status <- tibble(
    file_tag = module_years,
    ok       = FALSE,
    n_rows   = NA_integer_,
    note     = NA_character_
  )
  
  dfs <- vector("list", length(module_years))
  
  for (i in seq_along(module_years)) {
    x <- module_years[i]
    
    out <- tryCatch(
      {
        if (!quiet) {
          message("Attempting to download: ", x)
        }
        
        df <- nhanes(x) |> as.data.frame()
        
        if (!is.null(keep_vars)) {
          df <- df |> select(any_of(keep_vars))
        }
        
        n <- nrow(df)
        status$n_rows[i] <- n
        
        if (n == 0) {
          status$ok[i]   <- FALSE
          status$note[i] <- "Downloaded, 0 rows"
          if (!quiet) {
            warning("File ", x, " downloaded but has 0 rows.")
          }
          df <- NULL
        } else {
          status$ok[i]   <- TRUE
          status$note[i] <- "Loaded"
          df$file_tag    <- x
        }
        
        df
      },
      error = function(e) {
        status$ok[i]   <- FALSE
        status$note[i] <- paste("ERROR:", conditionMessage(e))
        if (!quiet) {
          warning("File ", x, " could not be downloaded: ", conditionMessage(e))
        }
        return(NULL)
      }
    )
    
    dfs[[i]] <- out
  }
  
  dfs_nonempty <- dfs[!vapply(dfs, is.null, logical(1))]
  
  if (!quiet) {
    message("\nSummary for module ", module, ":")
    print(status)
  }
  
  if (length(dfs_nonempty) == 0) {
    stop("No cycles successfully downloaded for module: ", module)
  }
  
  out <- bind_rows(dfs_nonempty)
  
  ### Missing variables filled with NA
  if (!is.null(keep_vars)) {
    missing_vars <- setdiff(keep_vars, names(out))
    if (length(missing_vars) > 0) {
      for (v in missing_vars) {
        out[[v]] <- NA_real_
      }
    }
  }
  out
}

## Helper function to convert yes/no (1/2) variables into 0/1
to01 <- function(x) {
  x <- as.character(x)
  case_when(
    x %in% c("1", "Yes", "YES", "yes") ~ 1,
    x %in% c("2", "No",  "NO",  "no")  ~ 0,
    TRUE ~ NA_real_
  )
}

## Helper function to convert survey cycle variable survey cycle labels
cycle_label <- function(s) {
  years <- str_extract(as.character(s), "\\d{4}-\\d{4}")
  
  case_when(
    !is.na(years) ~ years,
    TRUE ~ recode(
      as.numeric(s),
      `4` = "2005-2006",
      `5` = "2007-2008",
      `6` = "2009-2010",
      `7` = "2011-2012",
      .default = NA_character_
    )
  )
}

## Helper function to summarize variable availability and missingness by cycle
check_var_availability <- function(data, vars = NULL) {
  
  if (is.null(vars)) {
    vars <- setdiff(
      names(data),
      c(
        "seqn", "wtint2yr", "wtmec2yr", "wtint8yr", "wtmec8yr",
        "sdmvpsu", "sdmvstra", "sddsrvyr", "cycle_str"
      )
    )
  }
  
  map_dfr(vars, function(v) {
    data |>
      group_by(cycle = sddsrvyr) |>
      summarise(
        variable      = v,
        total         = n(),
        n_nonmissing  = sum(!is.na(.data[[v]])),
        n_missing     = sum(is.na(.data[[v]])),
        prop_missing  = mean(is.na(.data[[v]])),
        .groups       = "drop"
      )
  }) |>
    arrange(variable, cycle)
}

## Restrict analysis NHANES cycles spanning 2005–2012 (i.e., D–G)
cycles <- c("D", "E", "F", "G")


# Load NHANES modules--------------------------------------------------------------------------
## Load demographic variables within DEMO module (i.e., demographics, survey design variables)
demo_keep <- c(
  "SEQN",        # Participant ID
  "RIDAGEYR",    # Age
  "RIAGENDR",    # Sex (1 = Male, 2 = Female)
  "RIDRETH1",    # Race/ethnicity (5-level)
  "DMDEDUC2",    # Education level
  "INDFMPIR",    # Poverty-income ratio
  "SDDSRVYR",    # Survey cycle indicator
  "SDMVPSU",     # Primary sampling unit (PSU) for survey design
  "SDMVSTRA",    # Stratification variable for survey design
  "WTINT2YR",    # 2-year interview weight
  "WTMEC2YR",    # 2-year MEC exam weight
  "RIDEXPRG"     # Pregnancy status for females
)
demo <- load_nhanes_module("DEMO", keep_vars = demo_keep)

## Load medical variables within MCQ module (i.e., asthma, family history of asthma, arthritis)
mcq_keep <- c(
  "SEQN",
  "MCQ010",   # Asthma
  "MCQ300B",  # Family history asthma
  "MCQ160A",  # Arthritis yes/no
  "MCQ190",   # Arthritis type (2005–08)
  "MCQ191",   # Arthritis type (2009–10)
  "MCQ195"    # Arthritis type (2011–12)
)
mcq <- load_nhanes_module("MCQ", keep_vars = mcq_keep)

## Load eczema and allergic rhinitis data for 2005–2006 (AGQ module)
agq_d <- load_nhanes_module(
  "AGQ",
  cycles_vec = "D",
  keep_vars = c("SEQN","AGQ010","AGQ180")
)

## Load allergic rhinitis data for 2007–2012 (RDQ module)
rdq <- load_nhanes_module(
  "RDQ",
  cycles_vec = c("E","F","G"),
  keep_vars = c("SEQN","AGQ030")
)

## Load food allergy data (DBQ920) for 2007–2010
dbq <- load_nhanes_module("DBQ", keep_vars = c("SEQN","DBQ920"))

## Load diabetes status (DIQ010)
diq <- load_nhanes_module("DIQ", keep_vars = c("SEQN","DIQ010"))

## Load self-reported hypertension (BPQ020)
bpq <- load_nhanes_module("BPQ", keep_vars = c("SEQN","BPQ020"))

## Load anthropometrics (BMI, waist, height, weight)
bmx <- load_nhanes_module(
  "BMX",
  keep_vars = c("SEQN","BMXBMI","BMXWAIST","BMXHT","BMXWT")
)

## Load individual systolic and diastolic blood pressure readings
bp <- load_nhanes_module(
  "BPX",
  keep_vars = c(
    "SEQN",
    "BPXSY1","BPXSY2","BPXSY3","BPXSY4",
    "BPXDI1","BPXDI2","BPXDI3","BPXDI4"
  )
)

## Load serum vitamin D [25(OH)D] measurements
vid <- load_nhanes_module(
  "VID",
  keep_vars = c("SEQN", "LBXVIDMS", "LBXVIDLC", "LBDVIDMS")
)

## Load lifetime and current smoking status
smq <- load_nhanes_module("SMQ", keep_vars = c("SEQN","SMQ020","SMQ040"))

## Load health insurance coverage
hiq <- load_nhanes_module("HIQ", keep_vars = c("SEQN","HIQ011"))

## Load household food security
fsq <- load_nhanes_module("FSQ", keep_vars = c("SEQN","FSDHH"))

## Load alcohol consumption (ever drinker status, ALQ101)
alq <- load_nhanes_module("ALQ", keep_vars = c("SEQN","ALQ101","ALQ130"))

## Load HbA1c
hba1c <- load_nhanes_module("GHB", keep_vars = c("SEQN","LBXGH"))

## Load total and HDL cholesterol
tchol <- load_nhanes_module("TCHOL", keep_vars = c("SEQN","LBXTC"))
hdl   <- load_nhanes_module("HDL",   keep_vars = c("SEQN","LBDHDD"))

## Load physical activity variables
paq_keep <- c("SEQN","PAQ650","PAQ665","PAD200","PAD320")
paq <- load_nhanes_module("PAQ", keep_vars = paq_keep)

# Dietary Inflammatory Index (DII)-------------------------------------------------------------
## Define NHANES cycles for which DII is available (2005–2012)
dii_cycles <- c("0506", "0708", "0910", "1112")

## Obtain DII results for a single NHANES cycle (day 1 only)
get_dii_cycle <- function(cyc) {
  message("Obtain DII for NHANES cycle ", cyc, " (Day 1)...")
  
  DII_NHANES_PLUS_RESULT(
    NHANESCycle = cyc,
    DAY         = "first"
  ) |>
    mutate(cycle_code_ndp = cyc)
}

## Obtain and pool DII data across all relevant cycles
dii_all <- dii_cycles |>
  map_dfr(get_dii_cycle)

## Retain DII total scores (with and without alcohol) and nutrient components
dii_clean <- dii_all |>
  transmute(
    seqn         = SEQN,
    dii_etoh_raw = DII_ALL,      # DII including alcohol
    dii_raw      = DII_NOETOH,   # DII excluding alcohol
    kcal         = KCAL,
    carb         = CARB,
    protein      = PROTEIN,
    totalfat     = TOTALFAT,
    satfat       = SATFAT,
    pufa         = PUFA,
    mufa         = MUFA,
    n3fat        = N3FAT,
    n6fat        = N6FAT,
    choles       = CHOLES,
    vitA         = VITA,
    bcarotene    = BCAROTENE,
    vitC         = VITC,
    vitE         = VITE,
    vitB6        = VITB6,
    vitB12       = VITB12,
    folicacid    = FOLICACID,
    thiamin      = THIAMIN,
    riboflavin   = RIBOFLAVIN,
    niacin       = NIACIN,
    iron         = IRON,
    mg           = MG,
    zn           = ZN,
    se           = SE,
    fiber        = FIBER,
    caffeine     = CAFFEINE
  ) |>
  mutate(
    dii      = dii_raw,      # DII without alcohol (main exposure)
    dii_etoh = dii_etoh_raw  # DII including alcohol (secondary)
  )

## Define full set of DII component nutrient variables for quartile derivation
dii_component_vars <- c(
  "kcal", "carb", "protein", "totalfat", "satfat",
  "pufa", "mufa", "n3fat", "n6fat", "choles",
  "vitA", "bcarotene", "vitC", "vitE",
  "vitB6", "vitB12", "folicacid",
  "thiamin", "riboflavin", "niacin",
  "iron", "mg", "zn", "se",
  "fiber", "caffeine"
)


# Data cleaning and harmonization -------------------------------------------------------------

## Harmonize physical activity variables across NHANES cycles
paq_clean <- paq |>
  transmute(
    seqn   = SEQN,
    paq650 = PAQ650,   # Vigorous recreational activity ≥10 minutes (2007+)
    paq665 = PAQ665,   # Moderate recreational activity ≥10 minutes (2007+)
    pad200 = PAD200,   # Vigorous activity ≥10 minutes over past 30 days (2005–06)
    pad320 = PAD320    # Moderate activity ≥10 minutes over past 30 days (2005–06)
  )

## Clean demographic and design variables and derive a numeric cycle code
demo_clean <- demo |>
  mutate(
    sddsrvyr_label = as.character(SDDSRVYR),
    sddsrvyr = case_when(
      grepl("2005-2006", sddsrvyr_label) ~ 4,
      grepl("2007-2008", sddsrvyr_label) ~ 5,
      grepl("2009-2010", sddsrvyr_label) ~ 6,
      grepl("2011-2012", sddsrvyr_label) ~ 7,
      TRUE                               ~ NA_real_
    )
  ) |>
  transmute(
    seqn      = SEQN,
    age       = as.numeric(RIDAGEYR),   # Age (years)
    sex_raw   = RIAGENDR,               # Sex
    race_raw  = RIDRETH1,               # Race/ethnicity (1–5)
    educ_raw  = DMDEDUC2,               # Education level
    pir       = INDFMPIR,               # Poverty-income ratio
    sddsrvyr,
    cycle_str = cycle_label(sddsrvyr),  # Cycle label
    sdmvpsu   = SDMVPSU,                # PSU for survey design
    sdmvstra  = SDMVSTRA,               # Stratum for survey design
    wtint2yr  = WTINT2YR,               # Interview weight (2-year)
    wtmec2yr  = WTMEC2YR,               # MEC weight (2-year)
    preg      = RIDEXPRG                # Pregnancy status
  )

## Clean the medical conditions module and derive 0/1 indicators for asthma, family history of asthma, and arthritis
mcq_clean <- mcq |>
  transmute(
    seqn           = SEQN,
    asthma_raw     = MCQ010,    # Asthma
    fh_asthma_raw  = MCQ300B,   # Family history asthma
    arthritis_raw  = MCQ160A,   # Ever arthritis
    mcq190         = MCQ190,    # Arthritis type (2005–08)
    mcq191         = MCQ191,    # Arthritis type (2009–10)
    mcq195         = MCQ195     # Arthritis type (2011–12)
  ) |>
  mutate(
    asthma         = to01(asthma_raw),
    fh_asthma      = to01(fh_asthma_raw),
    arthritis_ever = to01(arthritis_raw)
  )

## Derive eczema status restricted to the 2005–2006 cycle
eczema_clean <- demo_clean |>
  select(seqn, sddsrvyr) |>
  left_join(
    agq_d |>
      transmute(
        seqn       = SEQN,
        agq180_ecz = to01(AGQ180)  # Eczema code
      ),
    by = "seqn"
  ) |>
  mutate(
    ad = if_else(sddsrvyr == 4, agq180_ecz, NA_real_)
  ) |>
  select(seqn, ad)

## Derive food allergy status restricted to the 2007–2010 cycles
fa_clean <- demo_clean |>
  select(seqn, sddsrvyr) |>
  left_join(
    dbq |>
      transmute(
        seqn      = SEQN,
        dbq920_fa = to01(DBQ920)   # Food allergy code
      ),
    by = "seqn"
  ) |>
  mutate(
    fa = if_else(sddsrvyr %in% c(5, 6), dbq920_fa, NA_real_)
  ) |>
  select(seqn, fa)

## Harmonize allergic rhinitis across 2005–2006 and 2007–2012
ar_clean <- demo_clean |>
  select(seqn, sddsrvyr) |>
  left_join(
    agq_d |>
      transmute(seqn = SEQN, ar_agq010 = to01(AGQ010)),  # AR code (2005–06)
    by = "seqn"
  ) |>
  left_join(
    rdq |>
      transmute(seqn = SEQN, ar_agq030_rdq = to01(AGQ030)), # AR code (2007–12)
    by = "seqn"
  ) |>
  mutate(
    ar = case_when(
      sddsrvyr == 4            ~ ar_agq010,
      sddsrvyr %in% c(5, 6, 7) ~ ar_agq030_rdq,
      TRUE                     ~ NA_real_
    )
  ) |>
  select(seqn, ar)

## Derive a 0/1 diabetes indicator from DIQ010
diq_clean <- diq |>
  transmute(
    seqn = SEQN,
    dm2_diag_chr = as.character(DIQ010)
  ) |>
  mutate(
    dm2 = case_when(
      dm2_diag_chr %in% c("1", "Yes")                ~ 1,
      dm2_diag_chr %in% c("2","No","3","Borderline") ~ 0,
      TRUE ~ NA_real_
    )
  ) |>
  select(seqn, dm2)

## Derive 0/1 hypertension indicator from BPQ020
bpq_clean <- bpq |>
  transmute(
    seqn    = SEQN,
    htn_chr = as.character(BPQ020)
  ) |>
  mutate(
    htn = case_when(
      htn_chr %in% c("1","Yes") ~ 1,
      htn_chr %in% c("2","No")  ~ 0,
      TRUE ~ NA_real_
    )
  ) |>
  select(seqn, htn)

## Clean anthropometrics and derive standard measures
bmx_clean <- bmx |>
  transmute(
    seqn      = SEQN,
    bmi       = BMXBMI,   # Body mass index (kg/m^2)
    waist_cm  = BMXWAIST, # Waist circumference (cm)
    height_cm = BMXHT,    # Standing height (cm)
    weight_kg = BMXWT     # Weight (kg)
  )

## Compute mean systolic and diastolic blood pressures across readings
bp_clean <- bp |>
  mutate(
    sbp_mean = rowMeans(across(BPXSY1:BPXSY4), na.rm = TRUE),
    dbp_mean = rowMeans(across(BPXDI1:BPXDI4), na.rm = TRUE),
    dbp_mean = if_else(dbp_mean == 0, NA_real_, dbp_mean)
  ) |>
  transmute(
    seqn     = SEQN,
    sbp_mean = if_else(is.nan(sbp_mean), NA_real_, sbp_mean),
    dbp_mean = if_else(is.nan(dbp_mean), NA_real_, dbp_mean)
  )

## Harmonize vitamin D [25(OH)D] across laboratory methods, taking first non-missing value
vid_clean <- vid |>
  mutate(
    vitD25oh = coalesce(LBXVIDMS, LBXVIDLC, LBDVIDMS)
  ) |>
  transmute(
    seqn     = SEQN,
    vitD25oh = vitD25oh
  )

## Derive smoking status (never, former, current)
smq_clean <- smq |>
  transmute(
    seqn       = SEQN,
    smoked_100 = as.numeric(SMQ020),
    smoke_now  = as.numeric(SMQ040)
  ) |>
  mutate(
    smoking = case_when(
      smoked_100 == 2 ~ "Never",
      smoked_100 == 1 & smoke_now == 3 ~ "Former",
      smoked_100 == 1 & smoke_now %in% c(1,2) ~ "Current",
      TRUE ~ NA_character_
    )
  )

## Derive insurance indicator using HIQ011
hiq_clean <- hiq |>
  transmute(
    seqn    = SEQN,
    insured = to01(HIQ011)  # 1 = insured, 0 = uninsured
  )

## Classify household food security into four ordered categories
fsq_clean <- fsq |>
  transmute(
    seqn = SEQN,
    foodsec_raw = as.numeric(FSDHH),
    food_security = case_when(
      foodsec_raw == 1 ~ "Full",
      foodsec_raw == 2 ~ "Marginal",
      foodsec_raw == 3 ~ "Low",
      foodsec_raw == 4 ~ "Very low",
      TRUE ~ NA_character_
    )
  )

## Alcohol indicator based on ever having more than 12 drinks in any one year (ALQ101)
alq_clean <- alq |>
  transmute(
    seqn        = SEQN,
    any_alcohol = case_when(
      ALQ101 %in% c(1, "1", "Yes") ~ 1,
      ALQ101 %in% c(2, "2", "No")  ~ 0,
      TRUE                         ~ NA_real_
    )
  )

## Retain HbA1c
hba1c_clean <- hba1c |>
  transmute(
    seqn  = SEQN,
    hba1c = LBXGH
  )

## Retain total and HDL cholesterol
tchol_clean <- tchol |> transmute(seqn = SEQN, tchol = LBXTC)
hdl_clean   <- hdl   |> transmute(seqn = SEQN, hdl   = LBDHDD)


# Merge modules and define analytic cohort ----------------------------------------------------

## Merge all cleaned modules to construct a unified NHANES dataset
nhanes_all <- demo_clean |>
  left_join(mcq_clean,    by = "seqn") |>
  left_join(eczema_clean, by = "seqn") |>
  left_join(fa_clean,     by = "seqn") |>
  left_join(ar_clean,     by = "seqn") |>
  left_join(bmx_clean,    by = "seqn") |>
  left_join(bp_clean,     by = "seqn") |>
  left_join(vid_clean,    by = "seqn") |>
  left_join(smq_clean,    by = "seqn") |>
  left_join(hiq_clean,    by = "seqn") |>
  left_join(fsq_clean,    by = "seqn") |>
  left_join(alq_clean,    by = "seqn") |>
  left_join(hba1c_clean,  by = "seqn") |>
  left_join(tchol_clean,  by = "seqn") |>
  left_join(hdl_clean,    by = "seqn") |>
  left_join(dii_clean,    by = "seqn") |>
  left_join(diq_clean,    by = "seqn") |>
  left_join(bpq_clean,    by = "seqn") |>
  left_join(paq_clean,    by = "seqn")

## Restrict the analytic cohort and derive all final variables used in regression and machine learning analyses
nhanes_analytic <- nhanes_all |>
  # Restrict to adults 20–40 years old, 2005–2012
  filter(
    !is.na(sddsrvyr),
    sddsrvyr %in% 4:7,
    age >= 20, age <= 40
  ) |>
  # Exclude pregnant participants at the exam
  filter(
    is.na(preg) |
      !preg %in% c("Yes, positive lab pregnancy test or self-reported pregnant at exam")
  ) |>
  mutate(
    # Multi-cycle weights (pooled 8-year weights over 4 cycles)
    wtint8yr = wtint2yr / length(cycles),
    wtmec8yr = wtmec2yr / length(cycles),
    
    # Sex factor
    sex = case_when(
      sex_raw %in% c(1, "1") ~ "Male",
      sex_raw %in% c(2, "2") ~ "Female",
      as.character(sex_raw) == "Male"   ~ "Male",
      as.character(sex_raw) == "Female" ~ "Female",
      TRUE ~ NA_character_
    ),
    sex = factor(sex, levels = c("Male", "Female")),
    
    # Race/ethnicity factor (RIDRETH1 1–5)
    race = case_when(
      race_raw %in% c(3, "3", "Non-Hispanic White")  ~ "Non-Hispanic White",
      race_raw %in% c(4, "4", "Non-Hispanic Black")  ~ "Non-Hispanic Black",
      race_raw %in% c(1, "1", "Mexican American")    ~ "Mexican American",
      race_raw %in% c(2, "2", "Other Hispanic")      ~ "Other Hispanic",
      race_raw %in% c(5, "5")                        ~ "Other/multiracial",
      TRUE                                           ~ NA_character_
    ),
    race = factor(
      race,
      levels = c(
        "Non-Hispanic White",
        "Non-Hispanic Black",
        "Mexican American",
        "Other Hispanic",
        "Other/multiracial"
      )
    ),
    
    # Education factor (collapsed)
    educ_factor_raw = factor(educ_raw),
    educ = case_when(
      educ_factor_raw %in% c(
        "Less Than 9th Grade",
        "Less than 9th grade"
      ) ~ "<9 years",
      
      educ_factor_raw %in% c(
        "9-11th Grade (Includes 12th grade with no diploma)",
        "9-11th grade (Includes 12th grade with no diploma)",
        "High School Grad/GED or Equivalent",
        "High school graduate/GED or equivalent"
      ) ~ "9–12 years",
      
      educ_factor_raw %in% c(
        "Some College or AA degree",
        "Some college or AA degree",
        "College Graduate or above",
        "College graduate or above"
      ) ~ ">12 years",
      
      TRUE ~ NA_character_
    ),
    educ = factor(
      educ,
      levels  = c("<9 years","9–12 years",">12 years")
    ),
    
    # Smoking factor
    smoking = factor(
      smoking,
      levels = c("Never", "Former", "Current")
    ),
    
    # Physical activity factor (Sedentary, Moderate, Intense), harmonized across cycles
    phys_act = case_when(
      # 2005–2006 (sddsrvyr == 4): PAD200 / PAD320
      sddsrvyr == 4 & to01(pad200) == 1 ~ "Intense",
      sddsrvyr == 4 & to01(pad200) != 1 & to01(pad320) == 1 ~ "Moderate",
      sddsrvyr == 4 & to01(pad200) == 0 & to01(pad320) == 0 ~ "Sedentary",
      
      # 2007–2012 (sddsrvyr %in% 5:7): PAQ650 / PAQ665
      sddsrvyr %in% c(5, 6, 7) & as.character(paq650) == "Yes" ~ "Intense",
      sddsrvyr %in% c(5, 6, 7) &
        as.character(paq650) == "No" & as.character(paq665) == "Yes" ~ "Moderate",
      sddsrvyr %in% c(5, 6, 7) &
        as.character(paq650) == "No" & as.character(paq665) == "No"  ~ "Sedentary",
      
      TRUE ~ NA_character_
    ),
    phys_act = factor(
      phys_act,
      levels = c("Sedentary","Moderate","Intense")
    ),
    
    # Food security factor
    food_sec = factor(
      food_security,
      levels = c("Full","Marginal","Low","Very low"),
      ordered = TRUE
    ),
    
    # Insurance factor
    insur = factor(
      case_when(
        insured == 1 ~ "Yes",
        insured == 0 ~ "No",
        TRUE ~ NA_character_
      ),
      levels = c("No", "Yes")
    ),
    
    # Alcohol factor (no/yes) based on any_alcohol from ALQ101
    alcohol = factor(
      case_when(
        any_alcohol == 0 ~ "No",
        any_alcohol == 1 ~ "Yes",
        TRUE ~ NA_character_
      ),
      levels = c("No", "Yes")
    ),
    
    # Validity flags for outcomes with restricted survey windows
    ad_valid = sddsrvyr == 4,          # Eczema only in 2005–06
    fa_valid = sddsrvyr %in% c(5, 6),  # Food allergy only in 2007–10
    
    # Inflammatory arthritis based on multiple arthritis type questions
    inflam_arthritis = {
      mcq190_chr <- as.character(mcq190)
      mcq191_chr <- as.character(mcq191)
      mcq195_chr <- as.character(mcq195)
      
      case_when(
        # Denial of ever having arthritis
        arthritis_ever == 0 ~ 0,
        
        # 2005–2006 and 2007–2008: MCQ190 arthritis type
        arthritis_ever == 1 & sddsrvyr %in% c(4, 5) &
          grepl("Rhe", mcq190_chr, ignore.case = TRUE) ~ 1,
        arthritis_ever == 1 & sddsrvyr %in% c(4, 5) &
          grepl("Ost|Osteo|Oth", mcq190_chr, ignore.case = TRUE) ~ 0,
        
        # 2009–2010: MCQ191 arthritis type 
        arthritis_ever == 1 & sddsrvyr == 6 &
          (grepl("Rhe", mcq191_chr, ignore.case = TRUE) |
             grepl("Pso", mcq191_chr, ignore.case = TRUE)) ~ 1,
        arthritis_ever == 1 & sddsrvyr == 6 &
          grepl("Ost|Osteo|Oth", mcq191_chr, ignore.case = TRUE) ~ 0,
        
        # 2011–2012: MCQ195 arthritis type
        arthritis_ever == 1 & sddsrvyr == 7 &
          (grepl("Rhe", mcq195_chr, ignore.case = TRUE) |
             grepl("Pso", mcq195_chr, ignore.case = TRUE)) ~ 1,
        arthritis_ever == 1 & sddsrvyr == 7 &
          grepl("Ost|Osteo|Oth", mcq195_chr, ignore.case = TRUE) ~ 0,
        
        TRUE ~ NA_real_
      )
    },
    
    # Final inflammatory arthritis, diabetes, and hypertension indicators (0/1)
    arthritis = inflam_arthritis,
    diabetes  = dm2,
    htn       = htn,
  
    # Family history of asthma (0/1)
    fh_asthma = fh_asthma,
    
    # Anthropometric ratio and non-HDL cholesterol
    whtr    = waist_cm / height_cm,                            # Waist-to-height ratio
    non_hdl = if_else(!is.na(tchol) & !is.na(hdl), tchol - hdl, NA_real_),  # Non-HDL
    
    # Vitamin D categories
    vitD25oh_cat = case_when(
      is.na(vitD25oh)                  ~ NA_character_,
      vitD25oh < 20                    ~ "<20 (deficient)",
      vitD25oh >= 20 & vitD25oh < 30   ~ "20–29 (insufficient)",
      vitD25oh >= 30                   ~ "≥30 (sufficient)"
    ),
    vitD25oh_cat = factor(
      vitD25oh_cat,
      levels = c("<20 (deficient)",
                 "20–29 (insufficient)",
                 "≥30 (sufficient)"),
      ordered = TRUE
    ),
    
    # DII quartiles (from most anti-inflammatory to most pro-inflammatory)
    dii_quart = ntile(dii, 4),
    dii_quart = factor(
      dii_quart,
      levels = c(1, 2, 3, 4),
      labels = c(
        "Q1 (most anti-inflammatory)",
        "Q2",
        "Q3",
        "Q4 (most pro-inflammatory)"
      ),
      ordered = TRUE
    ),
    
    # Pregnancy status (derived categorical indicator)
    preg_flag = case_when(
      preg == "Yes, positive lab pregnancy test or self-reported pregnant at exam" ~ 1,
      preg %in% c("SP not pregnant at exam",
                  "The participant was not pregnant at exam") ~ 0,
      preg %in% c("Cannot ascertain if SP is pregnant at exam",
                  "Cannot ascertain if the participant is pregnant at exam") ~ NA_real_,
      TRUE ~ NA_real_
    ),
    preg_status = factor(
      case_when(
        preg_flag == 1 ~ "Pregnant",
        preg_flag == 0 ~ "Not pregnant",
        TRUE ~ NA_character_
      ),
      levels = c("Not pregnant", "Pregnant")
    ),
    
    # Quartiles for DII component nutrients (keep only quartile factors downstream)
    across(
      all_of(dii_component_vars),
      ~ ntile(., 4),
      .names = "{.col}_quart"
    ),
    across(
      all_of(paste0(dii_component_vars, "_quart")),
      ~ factor(
        .,
        levels = c(1, 2, 3, 4),
        labels = c("Q1", "Q2", "Q3", "Q4"),
        ordered = TRUE
      )
    ),
    
    # BMI categories
    bmi_cat = case_when(
      is.na(bmi)       ~ NA_character_,
      bmi < 18.5       ~ "Underweight (<18.5)",
      bmi < 25         ~ "Normal (18.5–24.9)",
      bmi < 30         ~ "Overweight (25–29.9)",
      bmi < 35         ~ "Obesity I (30–34.9)",
      bmi < 40         ~ "Obesity II (35–39.9)",
      TRUE             ~ "Obesity III (≥40)"
    ),
    bmi_cat = factor(
      bmi_cat,
      levels = c(
        "Underweight (<18.5)",
        "Normal (18.5–24.9)",
        "Overweight (25–29.9)",
        "Obesity I (30–34.9)",
        "Obesity II (35–39.9)",
        "Obesity III (≥40)"
      ),
      ordered = TRUE
    ),
    
    # Waist-to-height ratio categories
    whtr_cat = case_when(
      is.na(whtr)      ~ NA_character_,
      whtr < 0.5       ~ "<0.5",
      whtr < 0.6       ~ "0.5–0.59",
      TRUE             ~ "≥0.6"
    ),
    whtr_cat = factor(
      whtr_cat,
      levels = c("<0.5", "0.5–0.59", "≥0.6"),
      ordered = TRUE
    ),
    
    # SBP categories
    sbp_cat = case_when(
      is.na(sbp_mean)  ~ NA_character_,
      sbp_mean < 120   ~ "<120",
      sbp_mean < 130   ~ "120–129",
      sbp_mean < 140   ~ "130–139",
      sbp_mean < 160   ~ "140–159",
      TRUE             ~ "≥160"
    ),
    sbp_cat = factor(
      sbp_cat,
      levels = c("<120", "120–129", "130–139", "140–159", "≥160"),
      ordered = TRUE
    ),
    
    # DBP categories
    dbp_cat = case_when(
      is.na(dbp_mean)  ~ NA_character_,
      dbp_mean < 80    ~ "<80",
      dbp_mean < 90    ~ "80–89",
      dbp_mean < 100   ~ "90–99",
      TRUE             ~ "≥100"
    ),
    dbp_cat = factor(
      dbp_cat,
      levels = c("<80", "80–89", "90–99", "≥100"),
      ordered = TRUE
    ),
    
    # Numeric variables for ordered categories (for trend tests and ML)
    dii_quart_trend    = if_else(!is.na(dii_quart),    as.numeric(dii_quart)    - 1, NA_real_),
    food_sec_score     = if_else(!is.na(food_sec),     as.numeric(food_sec)     - 1, NA_real_),
    vitD25oh_cat_trend = if_else(!is.na(vitD25oh_cat), as.numeric(vitD25oh_cat) - 1, NA_real_),
    bmi_cat_trend      = if_else(!is.na(bmi_cat),      as.numeric(bmi_cat)      - 1, NA_real_),
    whtr_cat_trend     = if_else(!is.na(whtr_cat),     as.numeric(whtr_cat)     - 1, NA_real_),
    sbp_cat_trend      = if_else(!is.na(sbp_cat),      as.numeric(sbp_cat)      - 1, NA_real_),
    dbp_cat_trend      = if_else(!is.na(dbp_cat),      as.numeric(dbp_cat)      - 1, NA_real_),
    phys_act_trend     = if_else(!is.na(phys_act),     as.numeric(phys_act)     - 1, NA_real_)
  )


# Finalize analytic dataset for regression and ML ---------------------------------------------
## Construct final analytic dataset that contains all variables required for both regression and machine learning analyses
nhanes_analytic <- nhanes_analytic |>
  select(
    # IDs and survey design
    seqn,
    sddsrvyr, cycle_str, sdmvpsu, sdmvstra,
    wtint2yr, wtmec2yr, wtint8yr, wtmec8yr,
    
    # Demographics 
    age,          # Age
    sex,          # Sex (Male/Female)
    race,         # Race/ethnicity (5 categories)
    educ,         # Education (<9, 9–12, >12 years)
    pir,          # Poverty-income ratio
    
    # Psychosocial and behavioral factors
    smoking,         # Smoking status (Never/Former/Current)
    phys_act,        # Physical activity (Sedentary/Moderate/Intense)
    phys_act_trend,  # Physical activity trend (0–2)
    food_sec,        # Food security (Full/Marginal/Low/Very low)
    food_sec_score,  # Food security score (0–3)
    insur,           # Health insurance (Yes/No)
    any_alcohol,     # Self-reported alcohol use (0/1; from ALQ101)
    alcohol,         # Self-reported alcohol use (No/Yes factor)
    
    # Anthropometrics and cardiometabolic variables
    bmi,          # Body mass index (kg/m^2)  
    whtr,         # Waist-to-height ratio      
    sbp_mean,     # Mean systolic BP (mmHg)    
    dbp_mean,     # Mean diastolic BP (mmHg)   
    tchol,        # Total cholesterol (mg/dL)
    hdl,          # HDL cholesterol (mg/dL)
    non_hdl,      # Non-HDL cholesterol (mg/dL)
    hba1c,        # HbA1c (%)
    
    # Anthropometric and BP categories 
    bmi_cat,
    bmi_cat_trend,  # BMI category trend (0–5)
    whtr_cat,
    whtr_cat_trend, # WHtR category trend (0–2)
    sbp_cat,
    sbp_cat_trend,  # SBP category trend (0–4)
    dbp_cat,
    dbp_cat_trend,  # DBP category trend (0–3)
    
    # Vitamin D and DII summary scores
    vitD25oh,             # Serum Vitamin D [25(OH)D] (nmol/L)
    vitD25oh_cat,         # Vitamin D categories
    vitD25oh_cat_trend,   # Vitamin D category trend (0–2)
    dii,                  # DII (no alcohol)
    dii_quart,            # DII quartiles (ordered)
    dii_quart_trend,      # DII quartile trend (0–3)
    dii_etoh,             # DII including alcohol (secondary)
    
    # DII component nutrient quartiles 
    kcal_quart, carb_quart, protein_quart, totalfat_quart, satfat_quart,
    pufa_quart, mufa_quart, n3fat_quart, n6fat_quart, choles_quart,
    vitA_quart, bcarotene_quart, vitC_quart, vitE_quart,
    vitB6_quart, vitB12_quart, folicacid_quart, thiamin_quart,
    riboflavin_quart, niacin_quart, iron_quart, mg_quart, zn_quart, se_quart,
    fiber_quart, caffeine_quart,
    
    # Disease outcomes and family history (0/1)
    asthma,       # Asthma
    ar,           # Allergic rhinitis
    arthritis,    # Inflammatory arthritis
    diabetes,     # Diabetes 
    htn,          # Hypertension
    ad,           # Eczema (2005–06)
    fa,           # Food allergy (2007–10)
    fh_asthma     # Family history of asthma (0/1)
  )

## Set reference levels for key categorical variables used in regression models
nhanes_analytic <- nhanes_analytic |>
  mutate(
    race         = fct_relevel(as.factor(race),         "Non-Hispanic White"),
    educ         = fct_relevel(as.factor(educ),         ">12 years"),
    vitD25oh_cat = fct_relevel(as.factor(vitD25oh_cat), "≥30 (sufficient)"),
    dii_quart    = fct_relevel(as.factor(dii_quart),    "Q1 (most anti-inflammatory)"),
    bmi_cat      = fct_relevel(as.factor(bmi_cat),      "Normal (18.5–24.9)")
  )


## Save analytic dataset for use in downstream regression and machine learning analyses
saveRDS(nhanes_analytic, file = "nhanes_analytic.rds")

## Add a constant for total population estimates
nhanes_analytic <- nhanes_analytic |> mutate(one = 1)

## Configure survey options for lonely PSUs
options(survey.lonely.psu = "adjust")

## Define interview-weighted survey design (8-year pooled weights)
des_int_8yr <- svydesign(
  ids     = ~sdmvpsu,
  strata  = ~sdmvstra,
  weights = ~wtint8yr,
  nest    = TRUE,
  data    = nhanes_analytic
)

## Define MEC-weighted survey design (8-year pooled weights)
des_mec_8yr <- svydesign(
  ids     = ~sdmvpsu,
  strata  = ~sdmvstra,
  weights = ~wtmec8yr,
  nest    = TRUE,
  data    = nhanes_analytic
)

## Total weighted population using MEC weights (stored)
total_pop_mec_8yr <- svytotal(~one, design = des_mec_8yr)

# Regression dataset generation ---------------------------------------------------------------
## Set survey and contrast options for regression models.
options(survey.lonely.psu = "adjust")
options(contrasts = c("contr.treatment", "contr.treatment"))

## Ensure that constant variable for total population estimates is present
if (!"one" %in% names(nhanes_analytic)) {
  nhanes_analytic <- nhanes_analytic |>
    mutate(one = 1)
}

## Construct outcome-specific analytic datasets restricted to non-missing outcomes
### Asthma (all cycles, non-missing)
asthma_reg <- nhanes_analytic |>
  filter(!is.na(asthma))

### Allergic rhinitis (all cycles, non-missing)
ar_reg <- nhanes_analytic |>
  filter(!is.na(ar))

### Inflammatory arthritis (all cycles, non-missing)
inflam_reg <- nhanes_analytic |>
  filter(!is.na(arthritis))

### Hypertension (all cycles, non-missing)
htn_reg <- nhanes_analytic |>
  filter(!is.na(htn))

### Diabetes (all cycles, non-missing)
dm_reg <- nhanes_analytic |>
  filter(!is.na(diabetes))

### Eczema (2005–2006, non-missing)
ad_reg <- nhanes_analytic |>
  filter(sddsrvyr == 4, !is.na(ad))

### Food allergy (2007–2010, non-missing)
fa_reg <- nhanes_analytic |>
  filter(sddsrvyr %in% c(5, 6), !is.na(fa))

## Define helper to construct outcome-specific survey designs using 8-year MEC weights
make_design_wt8yr <- function(data) {
  svydesign(
    ids     = ~sdmvpsu,
    strata  = ~sdmvstra,
    weights = ~wtmec8yr,
    nest    = TRUE,
    data    = data
  ) |>
    subset(wtmec8yr > 0)
}

## Construct outcome-specific survey designs for regression analyses
des_asthma_reg <- make_design_wt8yr(asthma_reg)
des_ar_reg     <- make_design_wt8yr(ar_reg)
des_inflam_reg <- make_design_wt8yr(inflam_reg)
des_htn_reg    <- make_design_wt8yr(htn_reg)
des_dm_reg     <- make_design_wt8yr(dm_reg)
des_ad_reg     <- make_design_wt8yr(ad_reg)
des_fa_reg     <- make_design_wt8yr(fa_reg)


## Define list structure to organize outcome-specific datasets and survey designs
outcome_designs <- list(
  asthma = list(
    var    = "asthma",
    label  = "Asthma",
    data   = asthma_reg,
    design = des_asthma_reg
  ),
  ar = list(
    var    = "ar",
    label  = "Allergic rhinitis",
    data   = ar_reg,
    design = des_ar_reg
  ),
  inflam = list(
    var    = "arthritis",
    label  = "Inflammatory arthritis",
    data   = inflam_reg,
    design = des_inflam_reg
  ),
  htn = list(
    var    = "htn",
    label  = "Hypertension",
    data   = htn_reg,
    design = des_htn_reg
  ),
  dm = list(
    var    = "diabetes",
    label  = "Diabetes",
    data   = dm_reg,
    design = des_dm_reg
  ),
  ad = list(
    var    = "ad",
    label  = "Eczema",
    data   = ad_reg,
    design = des_ad_reg
  ),
  fa = list(
    var    = "fa",
    label  = "Food allergy",
    data   = fa_reg,
    design = des_fa_reg
  )
)

# Variable information summary-----------------------------------------------------------------
## Full variable labels
var_labels <- c(
  # IDs / design
  seqn      = "Participant ID",
  sddsrvyr  = "Survey cycle",
  cycle_str = "Survey cycle label",
  sdmvpsu   = "Masked variance pseudo-PSU",
  sdmvstra  = "Masked variance pseudo-stratum",
  wtint2yr  = "Interview weight (2-year)",
  wtmec2yr  = "MEC exam weight (2-year)",
  wtint8yr  = "Interview weight (8-year pooled)",
  wtmec8yr  = "MEC exam weight (8-year pooled)",
  
  # Demographics
  age   = "Age in years",
  sex   = "Sex",
  race  = "Race/ethnicity",
  educ  = "Education level",
  pir   = "Poverty-income ratio",
  
  # Behavioral
  smoking         = "Smoking status",
  phys_act        = "Recreational physical activity level",
  phys_act_trend  = "Physical activity trend score",
  food_sec        = "Household food security status",
  food_sec_score  = "Food security ordinal score",
  insur           = "Any health insurance coverage",
  any_alcohol     = "Ever drank ≥12 drinks in any 1 year (ALQ101)",
  alcohol         = "Alcohol use",
  
  # Anthropometrics and cardiometabolic
  bmi       = "Body mass index (kg/m^2)",
  whtr      = "Waist-to-height ratio",
  sbp_mean  = "Mean systolic blood pressure (mmHg)",
  dbp_mean  = "Mean diastolic blood pressure (mmHg)",
  tchol     = "Total cholesterol (mg/dL)",
  hdl       = "HDL cholesterol (mg/dL)",
  non_hdl   = "Non-HDL cholesterol (mg/dL)",
  hba1c     = "Hemoglobin A1c (%)",
  bmi_cat   = "BMI category",
  bmi_cat_trend  = "BMI category trend score",
  whtr_cat       = "Waist-to-height ratio category",
  whtr_cat_trend = "WHtR category trend score",
  sbp_cat        = "Systolic blood pressure category",
  sbp_cat_trend  = "SBP category trend score",
  dbp_cat        = "Diastolic blood pressure category",
  dbp_cat_trend  = "DBP category trend score",
  
  # Vitamin D / DII
  vitD25oh           = "Serum Vitamin D [25(OH)D] concentration (nmol/L)",
  vitD25oh_cat       = "Vitamin D status category",
  vitD25oh_cat_trend = "Vitamin D status trend score",
  dii                = "Dietary Inflammatory Index (no alcohol)",
  dii_quart          = "DII quartile",
  dii_quart_trend    = "DII quartile trend score",
  dii_etoh           = "Dietary Inflammatory Index including alcohol",
  
  # Nutrient quartiles
  kcal_quart      = "Total energy intake (kcal) quartile",
  carb_quart      = "Carbohydrate intake quartile",
  protein_quart   = "Protein intake quartile",
  totalfat_quart  = "Total fat intake quartile",
  satfat_quart    = "Saturated fat intake quartile",
  pufa_quart      = "Polyunsaturated fat intake quartile",
  mufa_quart      = "Monounsaturated fat intake quartile",
  n3fat_quart     = "Omega-3 fat intake quartile",
  n6fat_quart     = "Omega-6 fat intake quartile",
  choles_quart    = "Cholesterol intake quartile",
  vitA_quart      = "Vitamin A intake quartile",
  bcarotene_quart = "Beta-carotene intake quartile",
  vitC_quart      = "Vitamin C intake quartile",
  vitE_quart      = "Vitamin E intake quartile",
  vitB6_quart     = "Vitamin B6 intake quartile",
  vitB12_quart    = "Vitamin B12 intake quartile",
  folicacid_quart = "Folic acid intake quartile",
  thiamin_quart   = "Thiamin intake quartile",
  riboflavin_quart = "Riboflavin intake quartile",
  niacin_quart    = "Niacin intake quartile",
  iron_quart      = "Iron intake quartile",
  mg_quart        = "Magnesium intake quartile",
  zn_quart        = "Zinc intake quartile",
  se_quart        = "Selenium intake quartile",
  fiber_quart     = "Fiber intake quartile",
  caffeine_quart  = "Caffeine intake quartile",
  
  # Outcomes and related variables
  asthma    = "Reported asthma",
  ar        = "Reported allergic rhinitis (hay fever)",
  arthritis = "Reported rheumatoid or psoriatic arthritis",
  diabetes  = "Reported diabetes",
  htn       = "Reported hypertension",
  ad        = "Reported eczema (atopic dermatitis; 2005–2006 only)",
  fa        = "Reported food allergy",
  fh_asthma = "Family history of asthma"
)

## Define variable categories (design, core predictors, nutrient predictors,
## main exposures, outcomes, and selected covariates).
var_roles <- c(
  # Design / ID
  seqn      = "id",
  sddsrvyr  = "design",
  cycle_str = "design",
  sdmvpsu   = "design",
  sdmvstra  = "design",
  wtint2yr  = "weight_2yr",
  wtmec2yr  = "weight_2yr",
  wtint8yr  = "weight_8yr",
  wtmec8yr  = "weight_8yr",
  
  # Core predictors
  age   = "predictor_core",
  sex   = "predictor_core",
  race  = "predictor_core",
  educ  = "predictor_core",
  pir   = "predictor_core",
  
  smoking         = "predictor_core",
  phys_act        = "predictor_core",
  phys_act_trend  = "predictor_core",
  food_sec        = "predictor_core",
  food_sec_score  = "predictor_core",
  insur           = "predictor_core",
  any_alcohol     = "predictor_core",
  alcohol         = "predictor_core",
  
  bmi       = "predictor_core",
  whtr      = "predictor_core",
  sbp_mean  = "predictor_core",
  dbp_mean  = "predictor_core",
  tchol     = "predictor_core",
  hdl       = "predictor_core",
  non_hdl   = "predictor_core",
  hba1c     = "predictor_core",
  bmi_cat   = "predictor_core",
  bmi_cat_trend  = "predictor_core",
  whtr_cat       = "predictor_core",
  whtr_cat_trend = "predictor_core",
  sbp_cat        = "predictor_core",
  sbp_cat_trend  = "predictor_core",
  dbp_cat        = "predictor_core",
  dbp_cat_trend  = "predictor_core",
  
  # Vitamin D / DII exposure domain
  vitD25oh           = "predictor_core",
  vitD25oh_cat       = "predictor_core",
  vitD25oh_cat_trend = "predictor_core",
  dii                = "exposure_DII_main",
  dii_quart          = "exposure_DII_ordinal",
  dii_quart_trend    = "exposure_DII_trend",
  dii_etoh           = "exposure_DII_secondary",
  
  # Nutrient quartiles
  kcal_quart      = "predictor_nutrient",
  carb_quart      = "predictor_nutrient",
  protein_quart   = "predictor_nutrient",
  totalfat_quart  = "predictor_nutrient",
  satfat_quart    = "predictor_nutrient",
  pufa_quart      = "predictor_nutrient",
  mufa_quart      = "predictor_nutrient",
  n3fat_quart     = "predictor_nutrient",
  n6fat_quart     = "predictor_nutrient",
  choles_quart    = "predictor_nutrient",
  vitA_quart      = "predictor_nutrient",
  bcarotene_quart = "predictor_nutrient",
  vitC_quart      = "predictor_nutrient",
  vitE_quart      = "predictor_nutrient",
  vitB6_quart     = "predictor_nutrient",
  vitB12_quart    = "predictor_nutrient",
  folicacid_quart = "predictor_nutrient",
  thiamin_quart   = "predictor_nutrient",
  riboflavin_quart = "predictor_nutrient",
  niacin_quart    = "predictor_nutrient",
  iron_quart      = "predictor_nutrient",
  mg_quart        = "predictor_nutrient",
  zn_quart        = "predictor_nutrient",
  se_quart        = "predictor_nutrient",
  fiber_quart     = "predictor_nutrient",
  caffeine_quart  = "predictor_nutrient",
  
  # Outcomes and related covariates
  asthma    = "outcome_primary",
  ar        = "outcome_primary",
  arthritis = "outcome_primary",
  diabetes  = "outcome_primary",
  htn       = "outcome_primary",
  ad        = "outcome_secondary",
  fa        = "outcome_secondary",
  fh_asthma = "predictor_family_history"
)

## Helper function to generate variable summary for data dictionary table
make_var_summary <- function(data, vars = names(data),
                             var_labels = NULL) {
  map_dfr(vars, function(v) {
    x <- data[[v]]
    n <- length(x)
    n_missing <- sum(is.na(x))
    pct_missing <- if (n > 0) n_missing / n else NA_real_
    
    type <- if (is.factor(x)) {
      "factor"
    } else if (is.character(x)) {
      "character"
    } else if (is.numeric(x)) {
      "numeric"
    } else if (is.logical(x)) {
      "logical"
    } else {
      paste(class(x), collapse = ",")
    }
    
    if (is.numeric(x)) {
      rng <- suppressWarnings(range(x, na.rm = TRUE))
      min_val <- if (length(rng) == 2 && all(is.finite(rng))) rng[1] else NA_real_
      max_val <- if (length(rng) == 2 && all(is.finite(rng))) rng[2] else NA_real_
      levels_info <- NA_character_
    } else {
      min_val <- NA_real_
      max_val <- NA_real_
      
      if (is.factor(x)) {
        lv <- levels(x)
      } else {
        lv <- unique(x)
        lv <- lv[!is.na(lv)]
        lv <- as.character(lv)
      }
      
      lv <- head(lv, 20L)
      levels_info <- if (length(lv) == 0) NA_character_ else paste(lv, collapse = "; ")
    }
    
    label <- if (!is.null(var_labels) && !is.null(var_labels[[v]])) {
      var_labels[[v]]
    } else {
      v
    }
    
    tibble(
      variable    = v,
      label       = label,
      type        = type,
      n           = n,
      n_missing   = n_missing,
      pct_missing = pct_missing,
      min         = min_val,
      max         = max_val,
      levels      = levels_info
    )
  })
}

## Specify variables to include in data dictionary table
vars_for_dict <- c(
  # Design / ID
  "seqn","sddsrvyr","cycle_str","sdmvpsu","sdmvstra",
  "wtint2yr","wtmec2yr","wtint8yr","wtmec8yr",
  
  # Demographics
  "age","sex","race","educ","pir",
  
  # Behaviours / psychosocial
  "smoking","phys_act","phys_act_trend",
  "food_sec",
  "insur","alcohol",
  
  # Anthropometrics & cardio-metabolic
  "bmi","whtr","sbp_mean","dbp_mean",
  "tchol","hdl","non_hdl","hba1c",
  "bmi_cat","bmi_cat_trend",
  "whtr_cat","whtr_cat_trend",
  "sbp_cat","sbp_cat_trend",
  "dbp_cat","dbp_cat_trend",
  
  # Vitamin D / DII
  "vitD25oh","vitD25oh_cat","vitD25oh_cat_trend",
  "dii","dii_quart","dii_quart_trend",
  "dii_etoh",
  
  # Nutrient quartiles
  "kcal_quart","carb_quart","protein_quart","totalfat_quart","satfat_quart",
  "pufa_quart","mufa_quart","n3fat_quart","n6fat_quart","choles_quart",
  "vitA_quart","bcarotene_quart","vitC_quart","vitE_quart",
  "vitB6_quart","vitB12_quart","folicacid_quart","thiamin_quart",
  "riboflavin_quart","niacin_quart","iron_quart","mg_quart","zn_quart","se_quart",
  "fiber_quart","caffeine_quart",
  
  # Outcomes and related variables
  "asthma","ar","arthritis","diabetes","htn","ad","fa",
  "fh_asthma"
)

## Create variable summary for table
var_summary <- make_var_summary(
  data       = nhanes_analytic,
  vars       = vars_for_dict,
  var_labels = var_labels
)

## Define grouping of variables for Table 1
table1_group_map <- c(
  seqn        = "Demographics",
  sddsrvyr    = "Demographics",
  cycle_str   = "Demographics",
  sdmvpsu     = "Demographics",
  sdmvstra    = "Demographics",
  wtint2yr    = "Demographics",
  wtmec2yr    = "Demographics",
  wtint8yr    = "Demographics",
  wtmec8yr    = "Demographics",
  age         = "Demographics",
  sex         = "Demographics",
  race        = "Demographics",
  educ        = "Demographics",
  pir         = "Demographics",
  smoking     = "Demographics",
  phys_act    = "Demographics",
  phys_act_trend = "Demographics",
  food_sec    = "Demographics",
  insur       = "Demographics",
  alcohol     = "Demographics",
  fh_asthma   = "Demographics",
  
  bmi             = "Anthropometrics",
  whtr            = "Anthropometrics",
  sbp_mean        = "Anthropometrics",
  dbp_mean        = "Anthropometrics",
  tchol           = "Anthropometrics",
  hdl             = "Anthropometrics",
  non_hdl         = "Anthropometrics",
  hba1c           = "Anthropometrics",
  bmi_cat         = "Anthropometrics",
  bmi_cat_trend   = "Anthropometrics",
  whtr_cat        = "Anthropometrics",
  whtr_cat_trend  = "Anthropometrics",
  sbp_cat         = "Anthropometrics",
  sbp_cat_trend   = "Anthropometrics",
  dbp_cat         = "Anthropometrics",
  dbp_cat_trend   = "Anthropometrics",
  
  vitD25oh           = "Nutrients",
  vitD25oh_cat       = "Nutrients",
  vitD25oh_cat_trend = "Nutrients",
  dii                = "Nutrients",
  dii_quart          = "Nutrients",
  dii_quart_trend    = "Nutrients",
  dii_etoh           = "Nutrients",
  kcal_quart         = "Nutrients",
  carb_quart         = "Nutrients",
  protein_quart      = "Nutrients",
  totalfat_quart     = "Nutrients",
  satfat_quart       = "Nutrients",
  pufa_quart         = "Nutrients",
  mufa_quart         = "Nutrients",
  n3fat_quart        = "Nutrients",
  n6fat_quart        = "Nutrients",
  choles_quart       = "Nutrients",
  vitA_quart         = "Nutrients",
  bcarotene_quart    = "Nutrients",
  vitC_quart         = "Nutrients",
  vitE_quart         = "Nutrients",
  vitB6_quart        = "Nutrients",
  vitB12_quart       = "Nutrients",
  folicacid_quart    = "Nutrients",
  thiamin_quart      = "Nutrients",
  riboflavin_quart   = "Nutrients",
  niacin_quart       = "Nutrients",
  iron_quart         = "Nutrients",
  mg_quart           = "Nutrients",
  zn_quart           = "Nutrients",
  se_quart           = "Nutrients",
  fiber_quart        = "Nutrients",
  caffeine_quart     = "Nutrients",
  
  asthma    = "Disease Outcomes",
  ar        = "Disease Outcomes",
  arthritis = "Disease Outcomes",
  diabetes  = "Disease Outcomes",
  htn       = "Disease Outcomes",
  ad        = "Disease Outcomes",
  fa        = "Disease Outcomes"
)

## Define helper function to identify binary numeric variables
identify_binary <- function(x) {
  vals <- unique(na.omit(x))
  length(vals) <= 2 && all(vals %in% c(0, 1))
}

## Prepare Table 1
table1_df <- var_summary |>
  mutate(
    # Binary 0/1 variable handling
    binary_flag = map_lgl(variable, ~ identify_binary(nhanes_analytic[[.x]])),
    Type = case_when(
      binary_flag ~ "binary",
      type == "numeric" ~ "numeric",
      TRUE ~ type
    ),
    N           = as.integer(n),
    N_missing   = as.integer(n_missing),
    group       = table1_group_map[variable]
  ) |>
  select(
    group,
    Variable    = variable,
    Content     = label,
    Type,
    N,
    N_missing,
    pct_missing,
    Min         = min,
    Max         = max,
    Levels      = levels,
    binary_flag
  ) |>
  mutate(
    Min    = if_else(binary_flag, 0, Min),
    Max    = if_else(binary_flag, 1, Max),
    Levels = if_else(binary_flag, "0=No; 1=Yes", Levels),
    # Fine-tune levels of certain variables
    Levels = case_when(
      Variable == "sddsrvyr"     ~ "4=2005–06; 5=2007–08; 6=2009–10; 7=2011–12",
      Variable == "bmi_cat"      ~ "Underweight (<18.5); Normal (18.5–24.9); Overweight (25–29.9); Obesity I (30–34.9); Obesity II (35–39.9); Obesity III (≥40)",
      Variable == "vitD25oh_cat" ~ "<20 (deficient); 20–29 (insufficient); ≥30 (sufficient)",
      Variable == "educ"         ~ "<9 years; 9–12 years; >12 years",
      TRUE ~ Levels
    )
  ) |>
  arrange(
    factor(
      group,
      levels = c("Demographics", "Anthropometrics", "Nutrients", "Disease Outcomes")
    ),
    Variable
  ) |>
  select(
    group,
    Variable,
    Content,
    Type,
    N,
    `N missing` = N_missing,
    `% missing` = pct_missing,
    Min,
    Max,
    Levels
  )

## Generate Table 1
labs_vars <- c("bmi","whtr","tchol","hdl","non_hdl","hba1c","vitD25oh","dii","dii_etoh")

nhanes_table1_gt <- table1_df |>
  gt(groupname_col = "group") |>
  tab_header(
    title = md("**Table 1: Extracted Variables: Types, Distributions, and Missingness**")
  ) |>
  cols_label(
    N = md("N"),
    `N missing` = md("N missing"),
    `% missing` = md("% missing")
  ) |>
  # % missing always with 3 decimals
  fmt_number(
    columns = `% missing`,
    decimals = 3
  ) |>
  # Min/Max for selected continuous variables: 1 decimal
  fmt_number(
    columns = c(Min, Max),
    decimals = 1,
    rows = Variable %in% labs_vars & Type != "binary"
  ) |>
  # Min/Max for all other non-binary numeric variables: 0 decimals
  fmt_number(
    columns = c(Min, Max),
    decimals = 0,
    rows = !(Variable %in% labs_vars) & Type != "binary"
  ) |>
  # Min/Max for binary 0/1 variables: 0 decimals (0 and 1)
  fmt_number(
    columns = c(Min, Max),
    decimals = 0,
    rows = Type == "binary"
  ) |>
  # For seqn, suppress thousand separators
  fmt_number(
    columns = c(Min, Max),
    decimals = 0,
    use_seps = FALSE,
    rows = Variable == "seqn"
  ) |>
  cols_align(align = "left") |>
  tab_options(table.font.size = px(12)) |>
  # Bold column headings and row group labels
  tab_style(
    style = list(
      cell_text(weight = "bold")
    ),
    locations = list(
      cells_column_labels(everything()),
      cells_row_groups()
    )
  )

# Additional adjustments for regression analyses-----------------------------------------------
## Add trend variables to outcome-specific survey designs
add_trend_vars_to_design <- function(design) {
  vars <- design$variables

  if ("vitD25oh_cat" %in% names(vars) && !"vitD25oh_cat_trend" %in% names(vars)) {
    vars <- vars |>
      mutate(
        vitD25oh_cat_trend = if_else(
          !is.na(vitD25oh_cat),
          as.numeric(vitD25oh_cat) - 1,
          NA_real_
        )
      )
  }

  if ("dii_quart" %in% names(vars) && !"dii_quart_trend" %in% names(vars)) {
    vars <- vars |>
      mutate(
        dii_quart_trend = if_else(
          !is.na(dii_quart),
          as.numeric(dii_quart) - 1,
          NA_real_
        )
      )
  }

  design$variables <- vars
  design
}

des_asthma_reg  <- add_trend_vars_to_design(des_asthma_reg)
des_ar_reg      <- add_trend_vars_to_design(des_ar_reg)
des_inflam_reg  <- add_trend_vars_to_design(des_inflam_reg)
des_htn_reg     <- add_trend_vars_to_design(des_htn_reg)
des_dm_reg      <- add_trend_vars_to_design(des_dm_reg)
des_ad_reg      <- add_trend_vars_to_design(des_ad_reg)
des_fa_reg      <- add_trend_vars_to_design(des_fa_reg)

## Full cohort design for characteristics and prevalence
make_design_wt8yr <- function(data) {
  svydesign(
    ids     = ~sdmvpsu,
    strata  = ~sdmvstra,
    weights = ~wtmec8yr,
    nest    = TRUE,
    data    = data
  ) |>
    subset(wtmec8yr > 0)
}

des_full <- make_design_wt8yr(nhanes_analytic)

## Outcome lists
outcome_list <- list(
  asthma = list(
    var    = "asthma",
    design = des_asthma_reg,
    label  = "Asthma"
  ),
  ar = list(
    var    = "ar",
    design = des_ar_reg,
    label  = "Allergic rhinitis"
  ),
  inflam = list(
    var    = "arthritis",
    design = des_inflam_reg,
    label  = "Inflammatory arthritis"
  ),
  htn = list(
    var    = "htn",
    design = des_htn_reg,
    label  = "Hypertension"
  ),
  dm = list(
    var    = "diabetes",
    design = des_dm_reg,
    label  = "Diabetes"
  )
)

outcome_list_secondary <- list(
  ad = list(
    var    = "ad",
    design = des_ad_reg,
    label  = "Atopic dermatitis"
  ),
  fa = list(
    var    = "fa",
    design = des_fa_reg,
    label  = "Food allergy"
  )
)

## Labels for core covariates and DII-related predictors
uni1_labels <- c(
  age                 = "Age (years)",
  sex                 = "Sex",
  race                = "Race/ethnicity",
  insur               = "Insurance",
  educ                = "Education",
  pir                 = "Poverty-income ratio",
  food_sec            = "Food security",
  smoking             = "Smoking status",
  phys_act            = "Physical activity",
  alcohol             = "Alcohol use",
  sbp_cat             = "Systolic blood pressure category",
  dbp_cat             = "Diastolic blood pressure category",
  bmi_cat             = "BMI category",
  whtr_cat            = "Waist-to-height ratio category",
  hdl                 = "HDL cholesterol (mg/dL)",
  non_hdl             = "Non-HDL cholesterol (mg/dL)",
  tchol               = "Total cholesterol (mg/dL)",
  hba1c               = "HbA1c (%)",
  vitD25oh_cat        = "25(OH)D category",
  vitD25oh_cat_trend  = "25(OH)D category trend (0–2)",
  dii                 = "Dietary Inflammatory Index (per unit)",
  dii_quart_trend     = "DII quartile trend (0–3)",
  dii_etoh            = "DII including alcohol (per unit)"
)

## Mapping of factor levels to labels for characteristics and univariable models
custom_order_uni1 <- tribble(
  ~predictor, ~level,                                  ~variable_label,                                         ~rank,
  # Sex
  "sex",      "Male",                                  "Sex: Male",                                            1,
  "sex",      "Female",                                "Sex: Female",                                          2,

  # Race (matches nhanes_analytic$race levels)
  "race",     "Non-Hispanic White",                    "Race/ethnicity: Non-Hispanic White",                   1,
  "race",     "Non-Hispanic Black",                    "Race/ethnicity: Non-Hispanic Black",                   2,
  "race",     "Mexican American",                      "Race/ethnicity: Mexican American",                     3,
  "race",     "Other Hispanic",                        "Race/ethnicity: Other Hispanic",                       4,
  "race",     "Other/multiracial",                     "Race/ethnicity: Other race",                           5,

  # Insurance
  "insur",    "No",                                    "Insurance: No",                                        1,
  "insur",    "Yes",                                   "Insurance: Yes",                                       2,

  # Education (matches nhanes_analytic$educ levels)
  "educ",     "<9 years",                              "Education: <9",                                        1,
  "educ",     "9–12 years",                            "Education: 9-12",                                      2,
  "educ",     ">12 years",                             "Education: >12",                                       3,

  # Food security
  "food_sec", "Full",                                  "Food security: Full",                                  1,
  "food_sec", "Marginal",                              "Food security: Marginal",                              2,
  "food_sec", "Low",                                   "Food security: Low",                                   3,
  "food_sec", "Very low",                              "Food security: Very low",                              4,

  # Smoking
  "smoking",  "Never",                                 "Smoking status: Never",                                1,
  "smoking",  "Current",                               "Smoking status: Current",                              2,
  "smoking",  "Former",                                "Smoking status: Former",                               3,

  # Alcohol
  "alcohol",  "No",                                    "Alcohol use: No",                                      1,
  "alcohol",  "Yes",                                   "Alcohol use: Yes",                                     2,

  # Physical activity
  "phys_act", "Sedentary",                             "Physical activity: Sedentary",                         1,
  "phys_act", "Moderate",                              "Physical activity: Moderate",                          2,
  "phys_act", "Intense",                               "Physical activity: Intense",                           3,

  # SBP categories (match levels in nhanes_analytic)
  "sbp_cat",  "<120",                                  "Systolic blood pressure category: <120",               1,
  "sbp_cat",  "120–129",                               "Systolic blood pressure category: 120-129",            2,
  "sbp_cat",  "130–139",                               "Systolic blood pressure category: 130-139",            3,
  "sbp_cat",  "140–159",                               "Systolic blood pressure category: 140-159",           4,
  "sbp_cat",  "≥160",                                  "Systolic blood pressure category: >=160",              5,

  # DBP categories
  "dbp_cat",  "<80",                                   "Diastolic blood pressure category: <80",               1,
  "dbp_cat",  "80–89",                                 "Diastolic blood pressure category: 80-89",             2,
  "dbp_cat",  "90–99",                                 "Diastolic blood pressure category: 90-99",             3,
  "dbp_cat",  "≥100",                                  "Diastolic blood pressure category: >=100",             4,

  # BMI categories
  "bmi_cat",  "Underweight (<18.5)",                   "BMI category: Underweight",                            1,
  "bmi_cat",  "Normal (18.5–24.9)",                    "BMI category: Normal weight",                          2,
  "bmi_cat",  "Overweight (25–29.9)",                  "BMI category: Overweight",                             3,
  "bmi_cat",  "Obesity I (30–34.9)",                   "BMI category: Obesity I",                              4,
  "bmi_cat",  "Obesity II (35–39.9)",                  "BMI category: Obesity II",                             5,
  "bmi_cat",  "Obesity III (≥40)",                     "BMI category: Obesity III",                            6,

  # WHtR categories
  "whtr_cat", "<0.5",                                  "Waist-to-height ratio category: <0.5",                 1,
  "whtr_cat", "0.5–0.59",                              "Waist-to-height ratio category: 0.5-0.59",             2,
  "whtr_cat", "≥0.6",                                  "Waist-to-height ratio category: >=0.6",                3,

  # Vitamin D categories
  "vitD25oh_cat", "<20 (deficient)",                   "25(OH)D category: <20",                                1,
  "vitD25oh_cat", "20–29 (insufficient)",              "25(OH)D category: 20-29",                              2,
  "vitD25oh_cat", "≥30 (sufficient)",                  "25(OH)D category: >=30",                               3
)

is_poly_term <- function(terms, predictor) {
  any(grepl(paste0("^", predictor, "\\."), terms))
}

add_ref_and_label <- function(td, des, predictor, outcome_name, outcome_info, base_label = NULL) {
  if (is.null(base_label)) {
    base_label <- predictor
  }

  var_data <- des$variables[[predictor]]

  ## Numeric predictors have no reference row, only single row with label
  if (is.numeric(var_data) || is.integer(var_data)) {
    td <- td |>
      mutate(variable_label = base_label)
    return(td)
  }

  ## Character predictors converted to factor
  if (is.character(var_data)) {
    var_data <- factor(var_data)
  }

  ## Detect polynomial terms and assign linear or quadratic trend
  if (is_poly_term(td$term, predictor)) {
    td <- td |>
      mutate(
        variable_label = case_when(
          grepl("\\.L$", term) ~ paste0(base_label, ": linear trend"),
          grepl("\\.Q$", term) ~ paste0(base_label, ": quadratic trend"),
          TRUE                 ~ paste0(base_label, ": ", term)
        )
      )
    return(td)
  }

  ## Add reference row and level labels for factor predictors
  if (is.factor(var_data)) {
    levels_vec <- levels(var_data)
    pattern    <- paste0("^", predictor)

    td_nonref <- td |>
      mutate(
        level          = sub(pattern, "", term),
        variable_label = paste0(base_label, ": ", level)
      )

    ref_level <- levels_vec[1]
    ref_label <- paste0(base_label, ": ", ref_level)

    ref_row <- tibble(
      term           = paste0(predictor, ref_level),
      estimate       = 1,
      conf.low       = 1,
      conf.high      = 1,
      p.value        = NA_real_,
      outcome        = outcome_name,
      outcome_lab    = outcome_info$label,
      predictor      = predictor,
      level          = ref_level,
      variable_label = ref_label
    )

    td_full <- bind_rows(ref_row, td_nonref)
    return(td_full)
  }
  td |>
    mutate(variable_label = base_label)
}

# Print Table 1 variable summary---------------------------------------------------------------
nhanes_table1_gt
Table 1: Extracted Variables: Types, Distributions, and Missingness
Variable Content Type N N missing Min Max Levels
Demographics
age Age in years numeric 7803 0 0.000 20 40 NA
alcohol Alcohol use factor 7803 1192 0.153 NA NA No; Yes
cycle_str Survey cycle label character 7803 0 0.000 NA NA 2005-2006; 2007-2008; 2009-2010; 2011-2012
educ Education level factor 7803 7 0.001 NA NA <9 years; 9–12 years; >12 years
fh_asthma Family history of asthma binary 7803 131 0.017 0 1 0=No; 1=Yes
food_sec Household food security status factor 7803 1539 0.197 NA NA Full; Marginal; Low; Very low
insur Any health insurance coverage factor 7803 7 0.001 NA NA No; Yes
phys_act Recreational physical activity level factor 7803 21 0.003 NA NA Sedentary; Moderate; Intense
phys_act_trend Physical activity trend score numeric 7803 21 0.003 0 2 NA
pir Poverty-income ratio numeric 7803 565 0.072 0 5 NA
race Race/ethnicity factor 7803 742 0.095 NA NA Non-Hispanic White; Non-Hispanic Black; Mexican American; Other Hispanic; Other/multiracial
sddsrvyr Survey cycle numeric 7803 0 0.000 4 7 4=2005–06; 5=2007–08; 6=2009–10; 7=2011–12
sdmvpsu Masked variance pseudo-PSU numeric 7803 0 0.000 1 3 NA
sdmvstra Masked variance pseudo-stratum numeric 7803 0 0.000 44 103 NA
seqn Participant ID numeric 7803 0 0.000 31144 71912 NA
sex Sex factor 7803 0 0.000 NA NA Male; Female
smoking Smoking status factor 7803 7 0.001 NA NA Never; Former; Current
wtint2yr Interview weight (2-year) numeric 7803 0 0.000 1,339 152,859 NA
wtint8yr Interview weight (8-year pooled) numeric 7803 0 0.000 335 38,215 NA
wtmec2yr MEC exam weight (2-year) numeric 7803 0 0.000 0 163,393 NA
wtmec8yr MEC exam weight (8-year pooled) numeric 7803 0 0.000 0 40,848 NA
Anthropometrics
bmi Body mass index (kg/m^2) numeric 7803 347 0.044 14.7 76.1 NA
bmi_cat BMI category factor 7803 347 0.044 NA NA Underweight (<18.5); Normal (18.5–24.9); Overweight (25–29.9); Obesity I (30–34.9); Obesity II (35–39.9); Obesity III (≥40)
bmi_cat_trend BMI category trend score numeric 7803 347 0.044 0 5 NA
dbp_cat Diastolic blood pressure category factor 7803 715 0.092 NA NA <80; 80–89; 90–99; ≥100
dbp_cat_trend DBP category trend score numeric 7803 715 0.092 0 3 NA
dbp_mean Mean diastolic blood pressure (mmHg) numeric 7803 715 0.092 3 125 NA
hba1c Hemoglobin A1c (%) numeric 7803 770 0.099 3.6 15.9 NA
hdl HDL cholesterol (mg/dL) numeric 7803 811 0.104 7.0 149.0 NA
non_hdl Non-HDL cholesterol (mg/dL) numeric 7803 811 0.104 27.0 494.0 NA
sbp_cat Systolic blood pressure category factor 7803 708 0.091 NA NA <120; 120–129; 130–139; 140–159; ≥160
sbp_cat_trend SBP category trend score numeric 7803 708 0.091 0 4 NA
sbp_mean Mean systolic blood pressure (mmHg) numeric 7803 708 0.091 80 196 NA
tchol Total cholesterol (mg/dL) numeric 7803 811 0.104 59.0 523.0 NA
whtr Waist-to-height ratio numeric 7803 633 0.081 0.4 1.1 NA
whtr_cat Waist-to-height ratio category factor 7803 633 0.081 NA NA <0.5; 0.5–0.59; ≥0.6
whtr_cat_trend WHtR category trend score numeric 7803 633 0.081 0 2 NA
Nutrients
bcarotene_quart Beta-carotene intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
caffeine_quart Caffeine intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
carb_quart Carbohydrate intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
choles_quart Cholesterol intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
dii Dietary Inflammatory Index (no alcohol) numeric 7803 745 0.095 −5.7 8.1 NA
dii_etoh Dietary Inflammatory Index including alcohol numeric 7803 745 0.095 −5.7 8.3 NA
dii_quart DII quartile factor 7803 745 0.095 NA NA Q1 (most anti-inflammatory); Q2; Q3; Q4 (most pro-inflammatory)
dii_quart_trend DII quartile trend score numeric 7803 745 0.095 0 3 NA
fiber_quart Fiber intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
folicacid_quart Folic acid intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
iron_quart Iron intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
kcal_quart Total energy intake (kcal) quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
mg_quart Magnesium intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
mufa_quart Monounsaturated fat intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
n3fat_quart Omega-3 fat intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
n6fat_quart Omega-6 fat intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
niacin_quart Niacin intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
protein_quart Protein intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
pufa_quart Polyunsaturated fat intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
riboflavin_quart Riboflavin intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
satfat_quart Saturated fat intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
se_quart Selenium intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
thiamin_quart Thiamin intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
totalfat_quart Total fat intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
vitA_quart Vitamin A intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
vitB12_quart Vitamin B12 intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
vitB6_quart Vitamin B6 intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
vitC_quart Vitamin C intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
vitD25oh Serum Vitamin D [25(OH)D] concentration (nmol/L) numeric 7803 1054 0.135 6.2 213.0 NA
vitD25oh_cat Vitamin D status category factor 7803 1054 0.135 NA NA <20 (deficient); 20–29 (insufficient); ≥30 (sufficient)
vitD25oh_cat_trend Vitamin D status trend score numeric 7803 1054 0.135 0 2 NA
vitE_quart Vitamin E intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
zn_quart Zinc intake quartile factor 7803 745 0.095 NA NA Q1; Q2; Q3; Q4
Disease Outcomes
ad Reported eczema (atopic dermatitis; 2005–2006 only) binary 7803 6108 0.783 0 1 0=No; 1=Yes
ar Reported allergic rhinitis (hay fever) binary 7803 20 0.003 0 1 0=No; 1=Yes
arthritis Reported rheumatoid or psoriatic arthritis binary 7803 188 0.024 0 1 0=No; 1=Yes
asthma Reported asthma binary 7803 6 0.001 0 1 0=No; 1=Yes
diabetes Reported diabetes binary 7803 7 0.001 0 1 0=No; 1=Yes
fa Reported food allergy binary 7803 3713 0.476 0 1 0=No; 1=Yes
htn Reported hypertension binary 7803 14 0.002 0 1 0=No; 1=Yes

4 Results

Cohort characteristics

Cohort characteristics are shown in Table 2 for the full cohort (2005–2012) and in Table 3 for the time-restricted cohorts for eczema (2005–2006) and food allergy (2007–2010). In general, participants with asthma, allergic rhinitis, eczema, and food allergy were similar to the respective full cohort populations, with only modest differences (e.g., higher proportions of males and small variations in physical activity and smoking patterns). Patients with allergic rhinitis, inflammatory arthritis, diabetes, and hypertension were slightly older than the full cohort. Those with inflammatory arthritis, diabetes, and hypertension also exhibited higher DII scores, while participants with allergic rhinitis had slightly lower DII scores relative to the full cohort. There were smaller differences in DII scores in the eczema and food allergy cohorts relative to their respective time-restricted full cohorts.

# Continuous covariates and categorical levels ----------------------------------------------
## Continuous characteristics to summarize as mean (SE)
char_continuous <- c("age", "pir", "hdl", "non_hdl", "tchol", "hba1c", "dii", "dii_etoh")

## Predictor order 
char_order <- c(
  "age",
  "sex",
  "race",
  "insur",
  "educ",
  "pir",
  "food_sec",
  "smoking",
  "phys_act",
  "alcohol",
  "sbp_cat",
  "dbp_cat",
  "bmi_cat",
  "whtr_cat",
  "hdl",
  "non_hdl",
  "tchol",
  "hba1c",
  "vitD25oh_cat",
  "dii",
  "dii_etoh"
)

## Characteristics to include
char_skeleton_raw <- bind_rows(
  tibble(
    predictor      = char_continuous,
    level          = NA_character_,
    variable_label = uni1_labels[char_continuous],
    type           = "continuous",
    rank           = NA_integer_
  ),
  custom_order_uni1 |>
    mutate(type = "categorical")
)

## Characteristic ordering
char_skeleton <- char_skeleton_raw |>
  mutate(
    predictor = factor(
      predictor,
      levels = c(char_order, setdiff(unique(predictor), char_order))
    )
  ) |>
  arrange(predictor, rank) |>
  mutate(
    predictor = as.character(predictor)
  )

## Helper function to compute characteristics for a single survey design
compute_char_for_group <- function(design, group_name) {
  vars <- design$variables

  map_dfr(
    seq_len(nrow(char_skeleton)),
    function(i) {
      row       <- char_skeleton[i, ]
      predictor <- row$predictor
      label     <- row$variable_label
      type      <- row$type
      lvl       <- row$level

      # If predictor not present, return NA
      if (!predictor %in% names(vars)) {
        return(tibble(
          Group          = group_name,
          Characteristic = label,
          Value          = NA_character_
        ))
      }

      if (type == "continuous") {
        # Continuous: survey-weighted mean (SE)
        f   <- as.formula(paste0("~", predictor))
        est <- svymean(f, design, na.rm = TRUE)
        m   <- as.numeric(est)[1]
        se  <- as.numeric(SE(est))[1]
        value <- sprintf("%.2f (%.2f)", m, se)

      } else {
        # Categorical: survey-weighted prevalence (SE) for each specified level
        var <- vars[[predictor]]

        # Coerce to factor and drop unused levels
        if (!is.factor(var)) {
          var <- factor(var)
        } else {
          var <- droplevels(var)
        }

        # If the target level is missing or not in the factor levels, return NA
        if (is.na(lvl) || !lvl %in% levels(var)) {
          return(tibble(
            Group          = group_name,
            Characteristic = label,
            Value          = NA_character_
          ))
        }

        # Restrict to non-missing values for this predictor
        nonmiss <- !is.na(var)
        if (!any(nonmiss)) {
          return(tibble(
            Group          = group_name,
            Characteristic = label,
            Value          = NA_character_
          ))
        }

        # Subset design to non-missing rows for predictor
        design_nm <- subset(design, nonmiss)

        # Recompute vars for subsetted design and attach clean indicator
        vars_nm <- design_nm$variables
        vars_nm$indicator <- as.numeric(var[nonmiss] == lvl)
        design_nm$variables <- vars_nm

        est <- tryCatch(
          svymean(~indicator, design_nm, na.rm = TRUE),
          error = function(e) NA
        )

        if (all(is.na(est))) {
          value <- NA_character_
        } else {
          p  <- as.numeric(est)[1] * 100
          se <- as.numeric(SE(est))[1] * 100
          value <- sprintf("%.1f%% (%.1f)", p, se)
        }
      }

      tibble(
        Group          = group_name,
        Characteristic = label,
        Value          = value
      )
    }
  )
}

# Full cohort and primary outcomes -----------------------------------------------------------
des_full_primary <- des_full

des_asthma_cases  <- subset(des_asthma_reg,  asthma == 1)
des_ar_cases      <- subset(des_ar_reg,      ar == 1)
des_inflam_cases  <- subset(des_inflam_reg,  arthritis == 1)
des_dm_cases      <- subset(des_dm_reg,      diabetes == 1)
des_htn_cases     <- subset(des_htn_reg,     htn == 1)

group_designs_primary <- list(
  "Full cohort"             = des_full_primary,
  "Asthma"                  = des_asthma_cases,
  "Allergic rhinitis"       = des_ar_cases,
  "Inflammatory arthritis"  = des_inflam_cases,
  "Diabetes"                = des_dm_cases,
  "Hypertension"            = des_htn_cases
)

char_primary_long <- imap_dfr(
  group_designs_primary,
  ~ compute_char_for_group(design = .x, group_name = .y)
)

## Rename characteristics for DII and Vitamin D
char_primary_wide <- char_primary_long |>
  mutate(
    Characteristic = dplyr::recode(
      Characteristic,
      "Dietary Inflammatory Index (per unit)" = "Dietary Inflammatory Index (DII)",
      "DII including alcohol (per unit)"      = "DII including alcohol"
    ),
    Characteristic = gsub(
      "25\\(OH\\)D category",
      "Vitamin D category",
      Characteristic
    )
  ) |>
  pivot_wider(
    id_cols     = Characteristic,
    names_from  = Group,
    values_from = Value
  )

tbl_char_primary <- char_primary_wide |>
  gt() |>
  tab_header(
    title = md(
      "**Table 2: Characteristics of Full Cohort (2005–2012) and Participants with Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension**"
    )
  ) |>
  fmt_markdown(columns = everything()) |>
  tab_options(
    table.font.size = px(10),
    column_labels.font.weight = "bold"
  ) |>
  tab_footnote(
    footnote = "For each characteristic, survey-weighted means (SE) are shown for continuous variables, and percentages of total within that category are shown for categorical variables.",
    locations = cells_title(groups = "title")
  )

# Time-restricted eczema and food allergy cohorts -------------------------------------------
ad_cohort <- nhanes_analytic |>
  filter(sddsrvyr == 4, !is.na(ad))

fa_cohort <- nhanes_analytic |>
  filter(sddsrvyr %in% c(5, 6), !is.na(fa))

des_ad_cohort <- make_design_wt8yr(ad_cohort)
des_fa_cohort <- make_design_wt8yr(fa_cohort)

des_ad_cases <- subset(des_ad_reg, ad == 1)
des_fa_cases <- subset(des_fa_reg, fa == 1)

group_designs_secondary <- list(
  "Cohort (2005–2006)"   = des_ad_cohort,
  "Eczema"               = des_ad_cases,
  "Cohort (2007–2010)"   = des_fa_cohort,
  "Food allergy"         = des_fa_cases
)

char_secondary_long <- imap_dfr(
  group_designs_secondary,
  ~ compute_char_for_group(design = .x, group_name = .y)
)

## Rename characteristics for DII and Vitamin D
char_secondary_wide <- char_secondary_long |>
  mutate(
    Characteristic = dplyr::recode(
      Characteristic,
      "Dietary Inflammatory Index (per unit)" = "Dietary Inflammatory Index (DII)",
      "DII including alcohol (per unit)"      = "DII including alcohol"
    ),
    Characteristic = gsub(
      "25\\(OH\\)D category",
      "Vitamin D category",
      Characteristic
    )
  ) |>
  pivot_wider(
    id_cols     = Characteristic,
    names_from  = Group,
    values_from = Value
  )

tbl_char_secondary <- char_secondary_wide |>
  gt() |>
  tab_header(
    title = md(
      "**Table 3: Characteristics of Time-Restricted Subcohorts and Participants with Eczema (2005–2006) and Food Allergy (2007–2010)**"
    )
  ) |>
  fmt_markdown(columns = everything()) |>
  tab_options(
    table.font.size = px(10),
    column_labels.font.weight = "bold"
  ) |>
  tab_footnote(
    footnote = "For each characteristic, survey-weighted means (SE) are shown for continuous variables, and percentages of total within that category are shown for categorical variables.",
    locations = cells_title(groups = "title")
  )

# Print cohort characteristic tables --------------------------------------------------------
tbl_char_primary
Table 2: Characteristics of Full Cohort (2005–2012) and Participants with Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension1
Characteristic Full cohort Asthma Allergic rhinitis Inflammatory arthritis Diabetes Hypertension
Age (years) 30.03 (0.15) 29.18 (0.26) 32.09 (0.22) 33.26 (0.71) 33.55 (0.44) 32.77 (0.24)
Sex: Male 51.3% (0.6) 45.1% (1.7) 44.2% (1.6) 31.7% (4.3) 45.9% (4.8) 55.4% (1.8)
Sex: Female 48.7% (0.6) 54.9% (1.7) 55.8% (1.6) 68.3% (4.3) 54.1% (4.8) 44.6% (1.8)
Race/ethnicity: Non-Hispanic White 65.9% (2.0) 71.6% (2.3) 77.7% (2.0) 74.2% (4.6) 55.9% (4.7) 63.2% (2.9)
Race/ethnicity: Non-Hispanic Black 13.8% (1.0) 15.4% (1.5) 10.7% (1.1) 14.2% (3.6) 22.5% (3.3) 20.7% (2.0)
Race/ethnicity: Mexican American 13.1% (1.2) 6.6% (1.0) 5.9% (0.9) 7.6% (2.2) 11.6% (2.2) 10.3% (1.3)
Race/ethnicity: Other Hispanic 7.3% (0.9) 6.3% (1.1) 5.7% (1.1) 4.0% (1.7) 10.1% (2.5) 5.8% (1.1)
Race/ethnicity: Other race NA NA NA NA NA NA
Insurance: No 30.5% (1.0) 26.7% (1.6) 23.9% (1.9) 16.7% (3.8) 24.0% (3.7) 25.6% (1.6)
Insurance: Yes 69.5% (1.0) 73.3% (1.6) 76.1% (1.9) 83.3% (3.8) 76.0% (3.7) 74.4% (1.6)
Education: <9 4.6% (0.4) 1.9% (0.5) 2.5% (0.6) 7.2% (3.2) 6.0% (2.1) 3.4% (0.8)
Education: 9-12 34.4% (1.3) 34.5% (2.0) 26.4% (2.0) 38.6% (5.5) 39.1% (4.4) 37.1% (1.9)
Education: >12 61.0% (1.4) 63.6% (2.2) 71.1% (2.2) 54.3% (7.1) 54.9% (4.8) 59.5% (2.2)
Poverty-income ratio 2.70 (0.05) 2.59 (0.09) 2.96 (0.09) 2.24 (0.17) 2.34 (0.13) 2.58 (0.07)
Food security: Full 83.5% (0.8) 80.5% (1.6) 87.5% (1.3) 74.2% (4.1) 76.1% (4.5) 80.5% (1.8)
Food security: Marginal 12.6% (0.7) 14.5% (1.6) 9.8% (1.0) 19.7% (4.3) 18.1% (4.3) 15.7% (1.6)
Food security: Low 2.6% (0.2) 2.3% (0.4) 1.8% (0.6) 2.5% (1.5) 2.4% (1.3) 2.5% (0.5)
Food security: Very low 1.3% (0.2) 2.6% (0.4) 0.9% (0.2) 3.6% (2.1) 3.4% (1.9) 1.3% (0.5)
Smoking status: Never 58.8% (1.1) 53.1% (2.2) 57.5% (2.4) 39.8% (5.3) 59.1% (4.2) 49.1% (2.0)
Smoking status: Current 26.9% (0.9) 33.6% (2.0) 25.4% (1.9) 46.6% (6.0) 28.1% (4.0) 34.0% (2.1)
Smoking status: Former 14.3% (0.6) 13.4% (1.3) 17.1% (1.8) 13.6% (4.1) 12.8% (3.6) 16.9% (1.5)
Physical activity: Sedentary 34.5% (1.0) 32.2% (1.5) 29.3% (1.7) 37.4% (5.1) 44.9% (3.8) 38.0% (2.1)
Physical activity: Moderate 24.4% (0.7) 25.3% (1.6) 28.8% (1.8) 38.8% (5.3) 32.9% (4.7) 27.5% (1.6)
Physical activity: Intense 41.1% (1.2) 42.4% (2.1) 41.9% (2.2) 23.8% (4.9) 22.2% (3.8) 34.5% (2.0)
Alcohol use: No 19.0% (0.7) 16.2% (1.4) 16.5% (1.5) 25.1% (4.3) 26.9% (3.7) 17.4% (1.3)
Alcohol use: Yes 81.0% (0.7) 83.8% (1.4) 83.5% (1.5) 74.9% (4.3) 73.1% (3.7) 82.6% (1.3)
Systolic blood pressure category: <120 68.5% (0.9) 67.4% (1.5) 70.6% (1.8) 74.2% (5.9) 42.6% (5.3) 38.5% (2.6)
Systolic blood pressure category: 120-129 21.3% (0.7) 23.1% (1.4) 20.2% (1.5) 17.4% (5.6) 37.3% (4.7) 31.7% (2.3)
Systolic blood pressure category: 130-139 6.7% (0.4) 6.1% (0.7) 6.1% (1.0) 8.4% (2.6) 13.3% (3.4) 16.8% (1.8)
Systolic blood pressure category: 140-159 3.3% (0.3) 3.0% (0.6) 2.9% (0.5) NA 5.1% (1.3) 10.9% (1.2)
Systolic blood pressure category: >=160 0.3% (0.1) 0.4% (0.2) 0.2% (0.1) NA 1.6% (1.2) 2.1% (0.4)
Diastolic blood pressure category: <80 84.9% (0.8) 84.1% (1.3) 82.8% (1.6) 83.5% (4.3) 66.3% (4.1) 64.2% (2.5)
Diastolic blood pressure category: 80-89 11.7% (0.6) 12.7% (1.3) 14.4% (1.5) 11.0% (3.4) 25.0% (3.7) 23.6% (2.0)
Diastolic blood pressure category: 90-99 2.8% (0.2) 2.5% (0.5) 2.5% (0.6) 5.5% (3.3) 8.0% (2.5) 9.0% (1.0)
Diastolic blood pressure category: >=100 0.7% (0.1) 0.7% (0.2) 0.3% (0.2) NA 0.7% (0.5) 3.2% (0.6)
BMI category: Underweight 2.3% (0.2) 1.2% (0.3) 1.8% (0.5) 0.4% (0.4) NA 0.7% (0.3)
BMI category: Normal weight 36.5% (1.0) 33.9% (2.1) 38.5% (2.4) 26.7% (6.6) 16.9% (3.5) 18.2% (1.7)
BMI category: Overweight 30.3% (0.8) 27.6% (1.6) 29.7% (2.5) 22.0% (5.3) 23.8% (3.7) 25.0% (2.1)
BMI category: Obesity I 17.5% (0.6) 19.5% (1.4) 16.3% (1.4) 25.3% (5.8) 16.6% (3.2) 25.7% (1.9)
BMI category: Obesity II 7.9% (0.4) 10.1% (1.1) 8.3% (1.1) 13.5% (3.0) 16.7% (3.7) 15.9% (1.4)
BMI category: Obesity III 5.6% (0.3) 7.7% (0.9) 5.4% (0.9) 12.0% (3.5) 25.9% (3.6) 14.6% (1.4)
Waist-to-height ratio category: <0.5 32.9% (1.1) 31.8% (2.1) 32.8% (2.1) 25.3% (6.5) 15.8% (3.9) 15.0% (1.5)
Waist-to-height ratio category: 0.5-0.59 40.1% (0.9) 35.3% (1.8) 39.9% (2.0) 34.9% (6.2) 19.8% (3.5) 34.8% (1.8)
Waist-to-height ratio category: >=0.6 27.1% (1.0) 32.9% (1.9) 27.3% (2.0) 39.8% (6.3) 64.4% (4.8) 50.2% (1.8)
HDL cholesterol (mg/dL) 51.20 (0.29) 51.37 (0.60) 51.80 (0.66) 51.40 (2.33) 47.78 (1.46) 46.61 (0.67)
Non-HDL cholesterol (mg/dL) 135.62 (0.63) 132.25 (1.30) 136.81 (1.75) 140.36 (4.26) 151.29 (4.03) 147.60 (1.81)
Total cholesterol (mg/dL) 186.82 (0.55) 183.61 (1.13) 188.61 (1.49) 191.76 (3.97) 199.07 (3.80) 194.21 (1.69)
HbA1c (%) 5.29 (0.01) 5.29 (0.02) 5.33 (0.03) 5.40 (0.09) 7.92 (0.21) 5.57 (0.04)
Vitamin D category: <20 1.2% (0.2) 1.1% (0.3) 0.9% (0.3) 1.4% (1.1) 0.7% (0.5) 2.0% (0.5)
Vitamin D category: 20-29 6.2% (0.5) 6.1% (0.8) 4.6% (0.8) 5.6% (2.0) 10.5% (2.3) 7.0% (0.9)
Vitamin D category: >=30 92.6% (0.6) 92.8% (0.9) 94.6% (0.8) 93.0% (2.5) 88.8% (2.4) 91.0% (1.2)
Dietary Inflammatory Index (DII) 2.54 (0.06) 2.60 (0.12) 2.30 (0.13) 3.29 (0.36) 2.82 (0.24) 2.78 (0.11)
DII including alcohol 2.68 (0.06) 2.74 (0.12) 2.44 (0.13) 3.49 (0.36) 3.03 (0.24) 2.92 (0.11)
1 For each characteristic, survey-weighted means (SE) are shown for continuous variables, and percentages of total within that category are shown for categorical variables.
tbl_char_secondary
Table 3: Characteristics of Time-Restricted Subcohorts and Participants with Eczema (2005–2006) and Food Allergy (2007–2010)1
Characteristic Cohort (2005–2006) Eczema Cohort (2007–2010) Food allergy
Age (years) 30.26 (0.22) 29.86 (0.97) 30.07 (0.15) 30.23 (0.26)
Sex: Male 51.8% (1.2) 39.9% (6.1) 51.4% (0.9) 43.9% (2.9)
Sex: Female 48.2% (1.2) 60.1% (6.1) 48.6% (0.9) 56.1% (2.9)
Race/ethnicity: Non-Hispanic White 68.3% (2.9) 79.5% (4.5) 65.9% (2.8) 63.0% (3.9)
Race/ethnicity: Non-Hispanic Black 13.7% (2.1) 16.1% (5.0) 13.6% (1.2) 19.4% (2.6)
Race/ethnicity: Mexican American 12.7% (1.5) 3.0% (1.1) 13.3% (1.8) 8.2% (1.4)
Race/ethnicity: Other Hispanic 5.2% (1.3) 1.4% (1.0) 7.2% (1.2) 9.4% (2.2)
Race/ethnicity: Other race NA NA NA NA
Insurance: No 29.5% (2.4) 21.6% (4.6) 31.4% (1.2) 26.2% (3.0)
Insurance: Yes 70.5% (2.4) 78.4% (4.6) 68.6% (1.2) 73.8% (3.0)
Education: <9 5.3% (0.9) 0.6% (0.1) 4.6% (0.5) 2.7% (1.0)
Education: 9-12 35.8% (2.4) 37.4% (6.0) 36.7% (1.7) 28.8% (2.9)
Education: >12 59.0% (2.8) 61.9% (6.0) 58.7% (1.7) 68.5% (3.2)
Poverty-income ratio 2.91 (0.06) 3.03 (0.13) 2.68 (0.05) 2.74 (0.09)
Food security: Full 76.7% (1.6) 72.3% (5.1) 87.4% (0.9) 89.7% (2.0)
Food security: Marginal 9.5% (0.9) 13.2% (2.6) 12.6% (0.9) 10.3% (2.0)
Food security: Low 9.2% (0.9) 8.3% (3.5) NA NA
Food security: Very low 4.6% (0.6) 6.1% (2.3) NA NA
Smoking status: Never 55.9% (1.4) 46.9% (6.5) 58.2% (1.7) 57.2% (2.8)
Smoking status: Current 29.5% (1.8) 40.1% (6.2) 27.6% (1.2) 25.7% (2.5)
Smoking status: Former 14.6% (0.8) 13.0% (5.2) 14.2% (1.0) 17.2% (2.2)
Physical activity: Sedentary 24.7% (1.6) 20.7% (5.3) 39.7% (1.3) 36.0% (3.5)
Physical activity: Moderate 24.0% (0.9) 25.6% (5.2) 23.8% (1.0) 26.1% (2.7)
Physical activity: Intense 51.2% (1.5) 53.8% (6.4) 36.5% (1.7) 37.9% (3.4)
Alcohol use: No 21.2% (1.9) 14.9% (5.2) 19.1% (0.9) 22.9% (2.3)
Alcohol use: Yes 78.8% (1.9) 85.1% (5.2) 80.9% (0.9) 77.1% (2.3)
Systolic blood pressure category: <120 65.9% (1.9) 65.4% (6.6) 69.4% (1.1) 76.4% (3.1)
Systolic blood pressure category: 120-129 21.4% (1.5) 22.1% (3.9) 21.3% (0.9) 14.5% (2.6)
Systolic blood pressure category: 130-139 8.8% (0.9) 8.9% (3.6) 5.7% (0.4) 6.6% (1.5)
Systolic blood pressure category: 140-159 3.7% (0.6) 3.6% (2.0) 3.3% (0.3) 2.5% (0.9)
Systolic blood pressure category: >=160 0.2% (0.1) NA 0.3% (0.1) NA
Diastolic blood pressure category: <80 86.9% (1.6) 83.9% (5.9) 84.5% (0.9) 83.1% (2.6)
Diastolic blood pressure category: 80-89 10.1% (1.4) 15.0% (5.8) 12.0% (0.8) 11.7% (2.1)
Diastolic blood pressure category: 90-99 2.6% (0.5) 1.1% (0.7) 2.8% (0.4) 4.2% (1.3)
Diastolic blood pressure category: >=100 0.5% (0.2) NA 0.6% (0.2) 1.0% (0.7)
BMI category: Underweight 2.6% (0.6) 1.2% (1.0) 2.0% (0.3) 2.0% (0.9)
BMI category: Normal weight 37.7% (1.9) 44.4% (4.6) 35.8% (1.3) 35.9% (3.1)
BMI category: Overweight 31.0% (1.8) 19.5% (3.4) 30.1% (0.9) 28.6% (2.7)
BMI category: Obesity I 16.5% (1.2) 16.0% (3.8) 18.0% (0.8) 18.7% (2.1)
BMI category: Obesity II 7.2% (0.9) 7.1% (2.7) 8.3% (0.6) 8.1% (1.6)
BMI category: Obesity III 5.0% (0.6) 11.8% (2.6) 5.8% (0.5) 6.8% (1.2)
Waist-to-height ratio category: <0.5 33.7% (2.1) 34.6% (5.2) 32.3% (1.5) 33.6% (3.1)
Waist-to-height ratio category: 0.5-0.59 41.2% (2.1) 34.5% (6.1) 40.3% (1.2) 36.8% (3.6)
Waist-to-height ratio category: >=0.6 25.1% (2.2) 30.9% (6.0) 27.4% (1.4) 29.6% (3.0)
HDL cholesterol (mg/dL) 52.58 (0.39) 53.35 (1.14) 50.53 (0.42) 51.17 (1.33)
Non-HDL cholesterol (mg/dL) 136.91 (1.38) 133.34 (5.78) 136.43 (0.88) 132.23 (2.68)
Total cholesterol (mg/dL) 189.50 (1.16) 186.70 (5.20) 186.96 (0.75) 183.40 (2.28)
HbA1c (%) 5.20 (0.02) 5.09 (0.05) 5.31 (0.01) 5.28 (0.03)
Vitamin D category: <20 0.3% (0.1) 0.4% (0.5) 1.6% (0.3) 2.8% (0.9)
Vitamin D category: 20-29 5.2% (0.8) 4.2% (1.7) 6.4% (0.7) 9.5% (1.8)
Vitamin D category: >=30 94.5% (0.9) 95.3% (1.8) 92.0% (0.9) 87.6% (2.4)
Dietary Inflammatory Index (DII) 1.89 (0.06) 2.09 (0.16) 3.31 (0.08) 2.98 (0.14)
DII including alcohol 2.03 (0.07) 2.22 (0.17) 3.46 (0.09) 3.15 (0.14)
1 For each characteristic, survey-weighted means (SE) are shown for continuous variables, and percentages of total within that category are shown for categorical variables.

Weighted prevalence analysis

Survey-weighted prevalence estimates were calculated for all disease outcomes, both overall and across DII quartiles (Table 4; Figure 1). Overall disease prevalence ranged from 1.3% for inflammatory arthritis to 15.9% for asthma. As compared with the weighted prevalence in the least inflammatory quartile (Q1), the weighted prevalence in the most inflammatory quartile (Q4) was significantly higher for inflammatory arthritis (2.1% vs. 0.8%; p<0.05) and significantly lower for allergic rhinitis (11.5% vs. 14.1%; p<0.05). There was a non-significant trend of higher Q4 disease prevalence for eczema and hypertension. Together, this suggested variable, modest effects of dietary inflammatory potential on disease outcomes.

# Weighted prevalence by DII quartile----------------------------------------------------------

## Outcome configurations for prevalence
outcome_list_all <- c(outcome_list, outcome_list_secondary)

compute_prev_by_dii <- function(info) {
  des       <- info$design
  y         <- info$var
  label_raw <- info$label

  ### Re-label atopic dermatitis to eczema for consistency
  label <- recode(label_raw, "Atopic dermatitis" = "Eczema")

  f    <- as.formula(paste0("~", y))
  vars <- des$variables

  rows <- list()

  ### Overall prevalence
  est_overall <- svymean(f, design = des, na.rm = TRUE)
  p_overall   <- as.numeric(est_overall)[1] * 100
  se_overall  <- as.numeric(SE(est_overall))[1] * 100

  rows[[length(rows) + 1]] <- tibble(
    Outcome   = label,
    dii_group = "Overall",
    prev_pct  = p_overall,
    prev_se   = se_overall
  )

  ### Prevalence by DII quartile
  if ("dii_quart" %in% names(vars)) {
    dii_levels <- levels(vars$dii_quart)
    for (lvl in dii_levels) {
      des_sub <- subset(des, dii_quart == lvl)
      est     <- svymean(f, design = des_sub, na.rm = TRUE)
      p       <- as.numeric(est)[1] * 100
      se      <- as.numeric(SE(est))[1] * 100

      rows[[length(rows) + 1]] <- tibble(
        Outcome   = label,
        dii_group = as.character(lvl),
        prev_pct  = p,
        prev_se   = se
      )
    }
  }

  if (length(rows) == 0L) {
    return(tibble(
      Outcome   = character(0),
      dii_group = character(0),
      prev_pct  = numeric(0),
      prev_se   = numeric(0)
    ))
  }

  bind_rows(rows)
}

prev_all_long <- map_dfr(
  outcome_list_all,
  compute_prev_by_dii
)

## Harmonize DII group labels and ordering of outcomes
prev_all_long <- prev_all_long |>
  mutate(
    dii_group_simple = case_when(
      dii_group == "Overall"        ~ "Overall",
      grepl("^Q1", dii_group)       ~ "Q1",
      grepl("^Q2", dii_group)       ~ "Q2",
      grepl("^Q3", dii_group)       ~ "Q3",
      grepl("^Q4", dii_group)       ~ "Q4",
      TRUE                          ~ dii_group
    ),
    ## Set base outcome order (for tables, etc.)
    Outcome = factor(
      Outcome,
      levels = c(
        "Asthma",
        "Allergic rhinitis",
        "Inflammatory arthritis",
        "Diabetes",
        "Hypertension",
        "Eczema",
        "Food allergy"
      )
    )
  )

## Compute p-values for Q2–Q4 vs Q1 using survey-weighted logistic regression
compute_prev_pvals <- function(info) {
  des       <- info$design
  y         <- info$var
  label_raw <- info$label

  label <- recode(label_raw, "Atopic dermatitis" = "Eczema")

  vars <- des$variables
  if (!("dii_quart" %in% names(vars))) {
    return(tibble(
      Outcome          = character(0),
      dii_group_simple = character(0),
      p_vs_Q1          = numeric(0)
    ))
  }

  dq <- vars$dii_quart
  if (!is.factor(dq)) {
    dq <- factor(dq)
  }

  levels_dq <- levels(dq)
  if (length(levels_dq) < 2) {
    return(tibble(
      Outcome          = character(0),
      dii_group_simple = character(0),
      p_vs_Q1          = numeric(0)
    ))
  }

  other_levels <- levels_dq[-1]

  # Fit survey-weighted logistic regression: outcome ~ dii_quart
  form <- as.formula(paste0(y, " ~ dii_quart"))

  fit <- tryCatch(
    svyglm(form, design = des, family = quasibinomial()),
    error = function(e) NULL
  )

  if (is.null(fit)) {
    return(tibble(
      Outcome          = character(0),
      dii_group_simple = character(0),
      p_vs_Q1          = numeric(0)
    ))
  }

  coefs <- summary(fit)$coef

  out_rows <- purrr::map_dfr(other_levels, function(lvl) {
    term <- paste0("dii_quart", lvl)

    if (!term %in% rownames(coefs)) {
      p_val <- NA_real_
    } else {
      p_val <- coefs[term, "Pr(>|t|)"]
    }

    quart_simple <- case_when(
      grepl("^Q1", lvl) ~ "Q1",
      grepl("^Q2", lvl) ~ "Q2",
      grepl("^Q3", lvl) ~ "Q3",
      grepl("^Q4", lvl) ~ "Q4",
      TRUE              ~ lvl
    )

    tibble(
      Outcome          = label,
      dii_group_simple = quart_simple,
      p_vs_Q1          = as.numeric(p_val)
    )
  })

  out_rows
}

prev_pvals <- purrr::map_dfr(
  outcome_list_all,
  compute_prev_pvals
)

## Join p-values back to prevalence data
prev_all_long <- prev_all_long |>
  left_join(
    prev_pvals,
    by = c("Outcome", "dii_group_simple")
  )

## Color scheme for outcomes
disease_base_cols <- c(
  "Asthma"                 = "#5E3C99",
  "Allergic rhinitis"      = "#89CFF0",
  "Inflammatory arthritis" = "#009E73",
  "Diabetes"               = "#F0E442",
  "Hypertension"           = "#D55E00",
  "Eczema"                 = "#E78AC3",
  "Food allergy"           = "#4B2E0F"
)

## Alpha for quartiles; overall uses a mid alpha
alpha_quart <- c(
  "Q1" = 0.4,
  "Q2" = 0.6,
  "Q3" = 0.8,
  "Q4" = 1.0
)
alpha_overall <- 0.7

prev_all_long <- prev_all_long |>
  mutate(
    base_col = disease_base_cols[as.character(Outcome)],
    fill_col = case_when(
      dii_group_simple == "Overall" ~ scales::alpha(base_col, alpha_overall), # mid shade
      TRUE                          ~ scales::alpha(base_col, alpha_quart[dii_group_simple])
    ),
    ## Pretty labels for facets, including split lines
    Outcome_pretty = forcats::fct_recode(
      Outcome,
      "Allergic\nrhinitis"      = "Allergic rhinitis",
      "Inflammatory\narthritis" = "Inflammatory arthritis"
    ),
    ## Explicit facet order:
    ## Asthma, Allergic rhinitis, Inflammatory arthritis, Eczema,
    ## Food allergy, Diabetes, Hypertension
    Outcome_pretty = factor(
      Outcome_pretty,
      levels = c(
        "Asthma",
        "Allergic\nrhinitis",
        "Inflammatory\narthritis",
        "Eczema",
        "Food allergy",
        "Diabetes",
        "Hypertension"
      )
    ),
    # Numeric x positions: extra gap between Overall and Q1
    x_pos = case_when(
      dii_group_simple == "Overall" ~ 1.0,
      dii_group_simple == "Q1"      ~ 2.5,
      dii_group_simple == "Q2"      ~ 3.5,
      dii_group_simple == "Q3"      ~ 4.5,
      dii_group_simple == "Q4"      ~ 5.5,
      TRUE                          ~ NA_real_
    )
  )

## Split data for quartiles vs overall
prev_quart <- prev_all_long |>
  filter(dii_group_simple != "Overall")

prev_overall <- prev_all_long |>
  filter(dii_group_simple == "Overall")

## Table 4: Weighted prevalence table with SEs and p-values------------------------------------

## Ensure Outcome ordering for the table as requested
prev_all_long <- prev_all_long |>
  mutate(
    Outcome = factor(
      Outcome,
      levels = c(
        "Asthma",
        "Allergic rhinitis",
        "Inflammatory arthritis",
        "Diabetes",
        "Hypertension",
        "Eczema",
        "Food allergy"
      )
    )
  )

table4_df <- prev_all_long |>
  mutate(
    Condition = case_when(
      dii_group_simple == "Overall" ~ "Overall",
      TRUE                          ~ dii_group
    )
  ) |>
  select(
    Outcome,
    Condition,
    `Weighted prevalence (%)` = prev_pct,
    `SE (%)`                  = prev_se,
    `p value`                 = p_vs_Q1
  ) |>
  arrange(Outcome, Condition)

nhanes_table4_gt <- table4_df |>
  gt(groupname_col = "Outcome") |>
  tab_header(
    title = md("**Table 4: Weighted Prevalence of Allergic, Immunologic, and Cardiometabolic Conditions Overall and by Dietary Inflammatory Index Quartile**")
  ) |>
  fmt_number(
    columns = c(`Weighted prevalence (%)`, `SE (%)`),
    decimals = 1
  ) |>
  fmt_number(
    columns = `p value`,
    decimals = 3
  ) |>
  cols_label(
    Condition                 = md("Condition"),
    `Weighted prevalence (%)` = md("Weighted prevalence (%)"),
    `SE (%)`                  = md("SE (%)"),
    `p value`                 = md("p value")
  ) |>
  tab_source_note(
    md(
      "P values were obtained in survey-weighted logistic regression models of each outcome on DII quartile, comparing Q2 to Q4 with Q1 using Wald tests."
    )
  ) |>
  tab_options(
    table.font.size       = px(12),
    data_row.padding      = px(1),
    row_group.padding     = px(1),
    column_labels.padding = px(1),
    table.width           = pct(80)
  ) |>
  tab_style(
    style = cell_text(weight = "bold"),
    locations = list(
      cells_column_labels(everything()),
      cells_row_groups()
    )
  )

## Weighted prevalence plot--------------------------------------------------------------------
weighted_prev_plot_all <- ggplot() +
  ### Quartile bars
  geom_col(
    data = prev_quart,
    aes(
      x    = x_pos,
      y    = prev_pct,
      fill = fill_col
    ),
    width = 0.9
  ) +
  ### Overall bars: mid-shade fill, thick dark border
  geom_col(
    data = prev_overall,
    aes(
      x     = x_pos,
      y     = prev_pct,
      fill  = fill_col,
      color = base_col   # darkest outline
    ),
    width = 0.9,
    size  = 0.9
  ) +
  facet_wrap(~ Outcome_pretty, ncol = 3, scales = "free_y") +
  scale_fill_identity(guide = "none") +
  scale_color_identity(guide = "none") +
  scale_x_continuous(
    breaks = c(1.0, 2.5, 3.5, 4.5, 5.5),
    labels = c("Overall", "Q1", "Q2", "Q3", "Q4"),
    expand = expansion(mult = c(0.05, 0.05))
  ) +
  labs(
    x     = NULL,
    y     = "Weighted prevalence (%)",
    title = "Figure 1: Weighted Prevalence of Allergic, Immunologic,\nCardiometabolic Conditions,\nOverall and by Dietary Inflammatory Index Quartile"
  ) +
  theme_minimal() +
  theme(
    plot.title         = element_text(face = "bold", size = 15, hjust = 0.5),
    strip.text         = element_text(face = "bold", size = 12),  # one size smaller
    axis.title.y       = element_text(face = "bold", size = 12),
    axis.title.x       = element_blank(),
    axis.text.x        = element_text(size = 11, angle = 45, hjust = 1),
    axis.text.y        = element_text(size = 11),
    aspect.ratio       = 1,
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.y = element_line(color = "grey90"),
    panel.grid.minor.y = element_blank()
  )

## Print Table 4 and Figure 1------------------------------------------------------------------
nhanes_table4_gt
Table 4: Weighted Prevalence of Allergic, Immunologic, and Cardiometabolic Conditions Overall and by Dietary Inflammatory Index Quartile
Condition Weighted prevalence (%) SE (%) p value
Asthma
Overall 15.9 0.6 NA
Q1 (most anti-inflammatory) 16.1 1.2 NA
Q2 14.9 1.1 0.412
Q3 16.1 1.2 0.994
Q4 (most pro-inflammatory) 17.0 1.0 0.566
Allergic rhinitis
Overall 11.9 0.5 NA
Q1 (most anti-inflammatory) 14.1 1.1 NA
Q2 11.6 1.0 0.103
Q3 11.5 1.0 0.082
Q4 (most pro-inflammatory) 11.5 0.8 0.035
Inflammatory arthritis
Overall 1.3 0.1 NA
Q1 (most anti-inflammatory) 0.8 0.3 NA
Q2 1.6 0.4 0.063
Q3 0.9 0.2 0.721
Q4 (most pro-inflammatory) 2.1 0.5 0.013
Diabetes
Overall 2.0 0.2 NA
Q1 (most anti-inflammatory) 2.1 0.3 NA
Q2 1.3 0.3 0.092
Q3 2.5 0.4 0.414
Q4 (most pro-inflammatory) 2.4 0.4 0.527
Hypertension
Overall 11.3 0.5 NA
Q1 (most anti-inflammatory) 10.5 0.9 NA
Q2 10.5 1.0 0.994
Q3 11.6 1.0 0.403
Q4 (most pro-inflammatory) 13.1 0.9 0.062
Eczema
Overall 7.6 0.8 NA
Q1 (most anti-inflammatory) 7.2 1.5 NA
Q2 6.9 1.2 0.866
Q3 8.0 1.2 0.673
Q4 (most pro-inflammatory) 11.4 4.3 0.308
Food allergy
Overall 10.1 0.5 NA
Q1 (most anti-inflammatory) 11.9 1.4 NA
Q2 9.3 1.0 0.130
Q3 11.6 1.2 0.882
Q4 (most pro-inflammatory) 9.1 0.8 0.085
P values were obtained in survey-weighted logistic regression models of each outcome on DII quartile, comparing Q2 to Q4 with Q1 using Wald tests.
weighted_prev_plot_all

Univariable regression analyses

Survey-weighted univariable logistic regression analyses were performed to examine associations between core sociodemographic, behavioral, clinical, and dietary covariates (Tables 5-6) or individual nutrient quartile covariates (Tables 7-8) and disease outcomes. Results for asthma, allergic rhinitis, inflammatory arthritis, diabetes, and hypertension are shown in Table 5 and Table 7. Results for eczema and food allergy are shown in Table 6 and Table 8.

Among the core covariates, the odds of allergic rhinitis increased with age (OR 1.06, 95% CI 1.05–1.08; p<0.001), while asthma decreased with age (OR 0.97, 95% CI 0.96–0.99; p<0.001). Female participants had higher odds of asthma (OR 1.35, 95% CI 1.17–1.56; p<0.001), allergic rhinitis (OR 1.39, 95% CI 1.19–1.61; p<0.001), and inflammatory arthritis (OR 2.31, 95% CI 1.53–3.49; p<0.001). Non-Hispanic Black (OR 0.62, 95% CI 0.50–0.78; p<0.001) and Mexican American (OR 0.35, 95% CI 0.27–0.45; p<0.001) participants had lower odds of allergic rhinitis, and Mexican American participants (OR 0.42, 95% CI 0.32–0.55; p<0.001) also had lower odds of asthma. Smoking increased the odds of inflammatory arthritis (OR 2.64, 95% CI 1.63–4.27; p<0.001). Among secondary outcomes, there was a trend of higher odds of eczema associated with obesity class III (OR 2.22, 95% CI 1.24–3.97; p=0.088), while Mexican American participants had lower odds of food allergy (OR 0.62, 95% CI 0.48–0.81; p=0.013). Regarding dietary inflammatory exposures, DII inclusive of alcohol intake was significantly associated with lower odds of allergic rhinitis (OR 0.96, 95% CI 0.92–0.99; p=0.044) and higher odds of hypertension (OR 1.05, 95% CI 1.01–1.08; p=0.049). In general, DII associations were not significant. Cardiometabolic outcomes were strongly associated with age and adiposity, for example increased odds of hypertension with age (OR 1.09, 95% CI 1.07–1.10; p<0.001), obesity class III (OR 7.15, 95% CI 5.10–10.02; p<0.001), and elevated waist-to-height ratio (OR 4.56, 95% CI 3.49–5.97; p<0.001).

In nutrient-specific analyses, there was a large degree of variability across conditions, and many associations did not achieve statistical significance after correction for multiple comparisons. Nevertheless, several significant associations were identified. For inflammatory arthritis, intakes of carbohydrates (Q3 OR 0.26, 95% CI 0.13–0.49; p=0.019) and total energy (Q3 OR 0.31, 95% CI 0.16–0.60; p=0.047) within the third quartile of the intake distribution were associated with lower odds of disease. Several non-significant trends were observed for cardiometabolic outcomes. For example, higher saturated fat intake (Q4 OR 1.29, 95% CI 1.04–1.60; p=0.218) and higher cholesterol intake (Q4 OR 1.39, 95% CI 1.11–1.73; p=0.095) showed a trend of increased odds of hypertension.

# Univariable analysis 1: core covariates------------------------------------------------------
## Define univariable analysis 1 predictors
uni1_predictors <- c(
  "age",
  "sex",
  "race",
  "insur",
  "educ",
  "pir",
  "food_sec",
  "smoking",
  "phys_act",
  "alcohol",
  "sbp_cat",
  "dbp_cat",
  "bmi_cat",
  "whtr_cat",
  "hdl",
  "non_hdl",
  "tchol",
  "hba1c",
  "vitD25oh_cat",
  "vitD25oh_cat_trend",
  "dii",
  "dii_quart_trend",
  "dii_etoh"
)

uni1_predictor_order <- uni1_predictors

## Helper to clean Vitamin D and DII labels
clean_vitD_DII_labels <- function(df) {
  df |>
    mutate(
      variable_label = as.character(variable_label),
      variable_label = str_replace_all(variable_label, "25\\(OH\\)D", "Vitamin D"),
      variable_label = str_replace(
        variable_label,
        "Vitamin D category trend \\(0–2\\)",
        "Vitamin D category trend"
      ),
      variable_label = str_replace(
        variable_label,
        "category trend \\(0–2\\)",
        "category trend"
      ),
      variable_label = str_replace(
        variable_label,
        "Dietary Inflammatory Index \\(per unit\\)",
        "DII"
      ),
      variable_label = str_replace(
        variable_label,
        "DII including alcohol \\(per unit\\)",
        "DII including alcohol"
      ),
      variable_label = str_replace(
        variable_label,
        "DII quartile trend \\(0–3\\)",
        "DII quartile trend"
      ),
      variable_label = factor(variable_label)
    )
}

## Helper function for univariable tables in gt
make_uni_gt <- function(data, title_text) {
  data |>
    gt() |>
    tab_header(
      title = md(title_text)
    ) |>
    fmt_markdown(columns = everything()) |>
    tab_options(table.font.size = px(10)) |>
    tab_style(
      style = cell_text(weight = "bold"),
      locations = cells_column_labels(everything())
    ) |>
    tab_footnote(
      footnote = "p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method",
      locations = cells_column_labels(columns = contains("p value"))
    )
}

## Univariable analysis 1 helper function
run_uni_model1 <- function(outcome_name, outcome_info, predictor) {
  des <- outcome_info$design
  y   <- outcome_info$var

  if (!predictor %in% names(des$variables)) {
    warning("Predictor ", predictor, " not found in design for outcome ", outcome_name, ". Skipping.")
    return(tibble())
  }

  form_uni <- as.formula(paste0(y, " ~ ", predictor))

  fit <- svyglm(
    form_uni,
    design = des,
    family = quasibinomial()
  )

  td <- tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
    filter(term != "(Intercept)") |>
    mutate(
      outcome     = outcome_name,
      outcome_lab = outcome_info$label,
      predictor   = predictor
    )

  base_label <- uni1_labels[[predictor]]
  if (is.null(base_label)) base_label <- predictor

  add_ref_and_label(
    td           = td,
    des          = des,
    predictor    = predictor,
    outcome_name = outcome_name,
    outcome_info = outcome_info,
    base_label   = base_label
  )
}

## Univariable analysis 1 primary outcomes
uni1_primary_long <- map_dfr(
  names(outcome_list),
  function(out_nm) {
    info <- outcome_list[[out_nm]]
    map_dfr(uni1_predictors, ~ run_uni_model1(out_nm, info, .x))
  }
)

uni1_primary_long <- uni1_primary_long |>
  rename(
    OR  = estimate,
    LCL = conf.low,
    UCL = conf.high,
    p   = p.value
  ) |>
  mutate(
    p_fdr = p.adjust(p, method = "BH"),
    or_ci = if_else(
      is.na(p) & OR == 1 & LCL == 1 & UCL == 1,
      "1 (Ref)",
      sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
    ),
    p_fdr_fmt = ifelse(is.na(p_fdr), "", pvalue(p_fdr, accuracy = 0.001))
  ) |>
  # Use predictor and variable_label to pull in ranks for ordering
  left_join(custom_order_uni1, by = c("predictor", "variable_label")) |>
  mutate(
    predictor = factor(predictor, levels = uni1_predictor_order),
    rank      = coalesce(rank, 1000L)
  ) |>
  clean_vitD_DII_labels() |>
  # Override rank for specific predictors/levels to enforce desired order
  mutate(
    rank2 = case_when(
      # Education: <9, 9–12, >12
      predictor == "educ" & str_detect(variable_label, "Education: <9 years")       ~ 1L,
      predictor == "educ" & str_detect(variable_label, "Education: 9–12 years")     ~ 2L,
      predictor == "educ" & str_detect(variable_label, "Education: >12 years")      ~ 3L,

      # BMI: Underweight, Normal, Overweight, Obesity I, II, III
      predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Underweight") ~ 1L,
      predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Normal")     ~ 2L,
      predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Overweight") ~ 3L,
      predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity I")  ~ 4L,
      predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity II") ~ 5L,
      predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity III")~ 6L,

      # Vitamin D category: <20, 20–29, ≥30
      predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: <20")   ~ 1L,
      predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: 20–29") ~ 2L,
      predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: ≥30")   ~ 3L,

      # Systolic BP: <120, 120–129, 130–139, 140–159, ≥160
      predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: <120")    ~ 1L,
      predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 120–129") ~ 2L,
      predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 130–139") ~ 3L,
      predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 140–159") ~ 4L,
      predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: ≥160")    ~ 5L,

      # Diastolic BP: <80, 80–89, 90–99, ≥100
      predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: <80")    ~ 1L,
      predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: 80–89")  ~ 2L,
      predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: 90–99")  ~ 3L,
      predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: ≥100")   ~ 4L,

      # Waist-to-height ratio: <0.5, 0.5–0.59, ≥0.6
      predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: <0.5")     ~ 1L,
      predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: 0.5–0.59") ~ 2L,
      predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: ≥0.6")     ~ 3L,

      TRUE ~ rank
    )
  ) |>
  arrange(predictor, rank2, variable_label, outcome_lab)

uni1_primary_wide <- uni1_primary_long |>
  select(
    variable_label,
    outcome_lab,
    or_ci,
    p_fdr_fmt
  ) |>
  pivot_wider(
    id_cols    = variable_label,
    names_from = outcome_lab,
    values_from = c(or_ci, p_fdr_fmt),
    names_glue = "{outcome_lab} {.value}"
  ) |>
  rename_with(~ str_replace(.x, " or_ci", " OR (95% CI)")) |>
  rename_with(~ str_replace(.x, " p_fdr_fmt", " p value")) |>
  rename(Variable = variable_label)

uni1_primary_outcomes_order <- c(
  "Asthma",
  "Allergic rhinitis",
  "Inflammatory arthritis",
  "Diabetes",
  "Hypertension"
)

uni1_primary_col_order <- c(
  "Variable",
  as.vector(rbind(
    paste0(uni1_primary_outcomes_order, " OR (95% CI)"),
    paste0(uni1_primary_outcomes_order, " p value")
  ))
)

uni1_primary_wide <- uni1_primary_wide |>
  select(all_of(uni1_primary_col_order))

tbl_uni1_primary <- make_uni_gt(
  uni1_primary_wide,
  "**Table 5: Univariable Associations of Core Covariates With Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension**"
)

## Univariable analysis 1 secondary outcomes
uni1_secondary_long <- map_dfr(
  names(outcome_list_secondary),
  function(out_nm) {
    info <- outcome_list_secondary[[out_nm]]
    map_dfr(uni1_predictors, ~ run_uni_model1(out_nm, info, .x))
  }
)

uni1_secondary_long <- uni1_secondary_long |>
  rename(
    OR  = estimate,
    LCL = conf.low,
    UCL = conf.high,
    p   = p.value
  ) |>
  mutate(
    p_fdr   = p.adjust(p, method = "BH"),
    or_ci   = if_else(
      is.na(p) & OR == 1 & LCL == 1 & UCL == 1,
      "1 (Ref)",
      sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
    ),
    p_fdr_fmt = ifelse(is.na(p_fdr), "", pvalue(p_fdr, accuracy = 0.001))
  ) |>
  left_join(custom_order_uni1, by = c("predictor", "variable_label")) |>
  mutate(
    predictor = factor(predictor, levels = uni1_predictor_order),
    rank      = coalesce(rank, 1000L)
  ) |>
  clean_vitD_DII_labels() |>
  mutate(
    rank2 = case_when(
      # Education: <9, 9–12, >12
      predictor == "educ" & str_detect(variable_label, "Education: <9 years")       ~ 1L,
      predictor == "educ" & str_detect(variable_label, "Education: 9–12 years")     ~ 2L,
      predictor == "educ" & str_detect(variable_label, "Education: >12 years")      ~ 3L,

      # BMI: Underweight, Normal, Overweight, Obesity I, II, III
      predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Underweight") ~ 1L,
      predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Normal")     ~ 2L,
      predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Overweight") ~ 3L,
      predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity I")  ~ 4L,
      predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity II") ~ 5L,
      predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity III")~ 6L,

      # Vitamin D category: <20, 20–29, ≥30
      predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: <20")   ~ 1L,
      predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: 20–29") ~ 2L,
      predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: ≥30")   ~ 3L,

      # Systolic BP: <120, 120–129, 130–139, 140–159, ≥160
      predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: <120")    ~ 1L,
      predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 120–129") ~ 2L,
      predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 130–139") ~ 3L,
      predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 140–159") ~ 4L,
      predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: ≥160")    ~ 5L,

      # Diastolic BP: <80, 80–89, 90–99, ≥100
      predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: <80")    ~ 1L,
      predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: 80–89")  ~ 2L,
      predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: 90–99")  ~ 3L,
      predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: ≥100")   ~ 4L,

      # Waist-to-height ratio: <0.5, 0.5–0.59, ≥0.6
      predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: <0.5")     ~ 1L,
      predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: 0.5–0.59") ~ 2L,
      predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: ≥0.6")     ~ 3L,

      TRUE ~ rank
    )
  ) |>
  arrange(predictor, rank2, variable_label, outcome_lab)

uni1_secondary_wide <- uni1_secondary_long |>
  select(
    variable_label,
    outcome_lab,
    or_ci,
    p_fdr_fmt
  ) |>
  pivot_wider(
    id_cols    = variable_label,
    names_from = outcome_lab,
    values_from = c(or_ci, p_fdr_fmt),
    names_glue = "{outcome_lab} {.value}"
  ) |>
  rename_with(~ str_replace(.x, " or_ci", " OR (95% CI)")) |>
  rename_with(~ str_replace(.x, " p_fdr_fmt", " p value")) |>
  rename(Variable = variable_label)

uni1_secondary_outcomes_order <- c(
  "Atopic dermatitis",
  "Food allergy"
)

uni1_secondary_col_order <- c(
  "Variable",
  as.vector(rbind(
    paste0(uni1_secondary_outcomes_order, " OR (95% CI)"),
    paste0(uni1_secondary_outcomes_order, " p value")
  ))
)

uni1_secondary_wide <- uni1_secondary_wide |>
  select(all_of(uni1_secondary_col_order))

tbl_uni1_secondary <- make_uni_gt(
  uni1_secondary_wide,
  "**Table 6: Univariable Associations of Core Covariates With Eczema and Food Allergy**"
)

# Univariable modeling 2: individual DII nutrient quartiles------------------------------------
## Define univariable analysis 2 predictors
dii_nutrient_bases <- c(
  "kcal", "carb", "protein", "totalfat", "satfat",
  "pufa", "mufa", "n3fat", "n6fat", "choles",
  "vitA", "bcarotene", "vitC", "vitE",
  "vitB6", "vitB12", "folicacid",
  "thiamin", "riboflavin", "niacin",
  "iron", "mg", "zn", "se",
  "fiber", "caffeine"
)

nutrient_labels <- c(
  kcal       = "Total energy intake (kcal)",
  carb       = "Carbohydrate intake",
  protein    = "Protein intake",
  totalfat   = "Total fat intake",
  satfat     = "Saturated fat intake",
  pufa       = "Polyunsaturated fat intake",
  mufa       = "Monounsaturated fat intake",
  n3fat      = "Omega-3 fat intake",
  n6fat      = "Omega-6 fat intake",
  choles     = "Cholesterol intake",
  vitA       = "Vitamin A intake",
  bcarotene  = "Beta-carotene intake",
  vitC       = "Vitamin C intake",
  vitE       = "Vitamin E intake",
  vitB6      = "Vitamin B6 intake",
  vitB12     = "Vitamin B12 intake",
  folicacid  = "Folic acid intake",
  thiamin    = "Thiamin intake",
  riboflavin = "Riboflavin intake",
  niacin     = "Niacin intake",
  iron       = "Iron intake",
  mg         = "Magnesium intake",
  zn         = "Zinc intake",
  se         = "Selenium intake",
  fiber      = "Fiber intake",
  caffeine   = "Caffeine intake"
)

dii_nutrient_quart_predictors <- paste0(dii_nutrient_bases, "_quart")

## Univariable analysis 2 helper function
run_uni_model2 <- function(outcome_name, outcome_info, predictor) {
  des <- outcome_info$design
  y   <- outcome_info$var

  if (!predictor %in% names(des$variables)) {
    warning("Predictor ", predictor, " not found in design for outcome ", outcome_name, ". Skipping.")
    return(tibble())
  }

  form_uni <- as.formula(paste0(y, " ~ ", predictor))

  fit <- svyglm(
    form_uni,
    design = des,
    family = quasibinomial()
  )

  td <- tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
    filter(term != "(Intercept)") |>
    mutate(
      outcome     = outcome_name,
      outcome_lab = outcome_info$label,
      predictor   = predictor
    )

  base       <- sub("_quart$", "", predictor)
  base_label <- nutrient_labels[[base]]
  if (is.null(base_label)) base_label <- base

  var_data <- des$variables[[predictor]]
  if (is.character(var_data)) {
    var_data <- factor(var_data)
  }

  if (is.factor(var_data)) {
    levels_vec <- levels(var_data)
    pattern    <- paste0("^", predictor)

    td_nonref <- td |>
      mutate(
        level          = sub(pattern, "", term),
        variable_label = paste0(base_label, ": ", level)
      )

    ref_level <- levels_vec[1]
    ref_label <- paste0(base_label, ": ", ref_level)

    ref_row <- tibble(
      term           = paste0(predictor, ref_level),
      estimate       = 1,
      conf.low       = 1,
      conf.high      = 1,
      p.value        = NA_real_,
      outcome        = outcome_name,
      outcome_lab    = outcome_info$label,
      predictor      = predictor,
      level          = ref_level,
      variable_label = ref_label
    )

    bind_rows(ref_row, td_nonref)
  } else {
    td |>
      mutate(variable_label = base_label)
  }
}

## Univariable analysis 2 primary outcomes
uni2_primary_long <- map_dfr(
  names(outcome_list),
  function(out_nm) {
    info <- outcome_list[[out_nm]]
    map_dfr(dii_nutrient_quart_predictors, ~ run_uni_model2(out_nm, info, .x))
  }
)

uni2_primary_long <- uni2_primary_long |>
  rename(
    OR  = estimate,
    LCL = conf.low,
    UCL = conf.high,
    p   = p.value
  ) |>
  mutate(
    p_fdr   = p.adjust(p, method = "BH"),
    or_ci   = if_else(
      is.na(p) & OR == 1 & LCL == 1 & UCL == 1,
      "1 (Ref)",
      sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
    ),
    p_fdr_fmt = ifelse(is.na(p_fdr), "", pvalue(p_fdr, accuracy = 0.001))
  ) |>
  mutate(
    base           = sub("_quart$", "", predictor),
    base           = factor(base, levels = dii_nutrient_bases),
    variable_label = factor(variable_label)
  ) |>
  arrange(base, variable_label, outcome_lab)

uni2_primary_wide <- uni2_primary_long |>
  select(
    variable_label,
    outcome_lab,
    or_ci,
    p_fdr_fmt
  ) |>
  pivot_wider(
    id_cols    = variable_label,
    names_from = outcome_lab,
    values_from = c(or_ci, p_fdr_fmt),
    names_glue = "{outcome_lab} {.value}"
  ) |>
  rename_with(~ str_replace(.x, " or_ci", " OR (95% CI)")) |>
  rename_with(~ str_replace(.x, " p_fdr_fmt", " p value")) |>
  rename(Variable = variable_label)

uni2_primary_outcomes_order <- c(
  "Asthma",
  "Allergic rhinitis",
  "Inflammatory arthritis",
  "Diabetes",
  "Hypertension"
)

uni2_primary_col_order <- c(
  "Variable",
  as.vector(rbind(
    paste0(uni2_primary_outcomes_order, " OR (95% CI)"),
    paste0(uni2_primary_outcomes_order, " p value")
  ))
)

uni2_primary_wide <- uni2_primary_wide |>
  select(all_of(uni2_primary_col_order))

tbl_uni2_primary <- make_uni_gt(
  uni2_primary_wide,
  "**Table 7: Univariable Associations of Individual Nutritional Covariates With Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension**"
)

## Univariable analysis 2 secondary outcomes
uni2_secondary_long <- map_dfr(
  names(outcome_list_secondary),
  function(out_nm) {
    info <- outcome_list_secondary[[out_nm]]
    map_dfr(dii_nutrient_quart_predictors, ~ run_uni_model2(out_nm, info, .x))
  }
)

uni2_secondary_long <- uni2_secondary_long |>
  rename(
    OR  = estimate,
    LCL = conf.low,
    UCL = conf.high,
    p   = p.value
  ) |>
  mutate(
    p_fdr   = p.adjust(p, method = "BH"),
    or_ci   = if_else(
      is.na(p) & OR == 1 & LCL == 1 & UCL == 1,
      "1 (Ref)",
      sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
    ),
    p_fdr_fmt = ifelse(is.na(p_fdr), "", pvalue(p_fdr, accuracy = 0.001))
  ) |>
  mutate(
    base           = sub("_quart$", "", predictor),
    base           = factor(base, levels = dii_nutrient_bases),
    variable_label = factor(variable_label)
  ) |>
  arrange(base, variable_label, outcome_lab)

uni2_secondary_wide <- uni2_secondary_long |>
  select(
    variable_label,
    outcome_lab,
    or_ci,
    p_fdr_fmt
  ) |>
  pivot_wider(
    id_cols    = variable_label,
    names_from = outcome_lab,
    values_from = c(or_ci, p_fdr_fmt),
    names_glue = "{outcome_lab} {.value}"
  ) |>
  rename_with(~ str_replace(.x, " or_ci", " OR (95% CI)")) |>
  rename_with(~ str_replace(.x, " p_fdr_fmt", " p value")) |>
  rename(Variable = variable_label)

uni2_secondary_outcomes_order <- c(
  "Atopic dermatitis",
  "Food allergy"
)

uni2_secondary_col_order <- c(
  "Variable",
  as.vector(rbind(
    paste0(uni2_secondary_outcomes_order, " OR (95% CI)"),
    paste0(uni2_secondary_outcomes_order, " p value")
  ))
)

uni2_secondary_wide <- uni2_secondary_wide |>
  select(all_of(uni2_secondary_col_order))

tbl_uni2_secondary <- make_uni_gt(
  uni2_secondary_wide,
  "**Table 8: Univariable Associations of Individual Nutritional Covariates With Eczema and Food Allergy**"
)

# Print univariable analysis tables------------------------------------------------------------
tbl_uni1_primary
Table 5: Univariable Associations of Core Covariates With Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension
Variable Asthma OR (95% CI) Asthma p value1 Allergic rhinitis OR (95% CI) Allergic rhinitis p value1 Inflammatory arthritis OR (95% CI) Inflammatory arthritis p value1 Diabetes OR (95% CI) Diabetes p value1 Hypertension OR (95% CI) Hypertension p value1
Age (years) 0.97 (0.96, 0.99) <0.001 1.06 (1.05, 1.08) <0.001 1.10 (1.05, 1.15) <0.001 1.11 (1.07, 1.14) <0.001 1.09 (1.07, 1.10) <0.001
Sex: Male 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Sex: Female 1.35 (1.17, 1.56) <0.001 1.39 (1.19, 1.61) <0.001 2.31 (1.53, 3.49) <0.001 1.25 (0.85, 1.83) 0.363 0.83 (0.71, 0.97) 0.050
Race/ethnicity: Non-Hispanic White 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Race/ethnicity: Non-Hispanic Black 1.04 (0.87, 1.24) 0.799 0.62 (0.50, 0.78) <0.001 0.91 (0.52, 1.58) 0.833 1.96 (1.30, 2.95) 0.007 1.69 (1.39, 2.05) <0.001
Race/ethnicity: Mexican American 0.42 (0.32, 0.55) <0.001 0.35 (0.27, 0.45) <0.001 0.50 (0.26, 0.96) 0.088 1.04 (0.66, 1.64) 0.912 0.80 (0.62, 1.03) 0.151
Race/ethnicity: Other Hispanic 0.77 (0.57, 1.03) 0.144 0.62 (0.42, 0.92) 0.049 0.48 (0.20, 1.16) 0.182 1.66 (1.00, 2.75) 0.101 0.82 (0.60, 1.11) 0.296
Insurance: No 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Insurance: Yes 1.25 (1.05, 1.50) 0.043 1.46 (1.19, 1.79) 0.002 2.22 (1.27, 3.89) 0.020 1.40 (0.95, 2.07) 0.155 1.31 (1.12, 1.54) 0.004
Education: <9 years 0.36 (0.20, 0.65) 0.004 0.44 (0.27, 0.71) 0.004 1.80 (0.63, 5.11) 0.376 1.47 (0.67, 3.21) 0.450 0.76 (0.48, 1.20) 0.345
Education: 9–12 years 0.95 (0.81, 1.12) 0.690 0.62 (0.52, 0.75) <0.001 1.28 (0.79, 2.08) 0.421 1.27 (0.87, 1.87) 0.327 1.12 (0.95, 1.33) 0.269
Education: >12 years 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Poverty-income ratio 0.95 (0.90, 1.01) 0.190 1.12 (1.06, 1.18) <0.001 0.84 (0.73, 0.96) 0.040 0.87 (0.79, 0.96) 0.019 0.95 (0.90, 1.00) 0.109
Food security: Full 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Food security: Marginal 1.24 (0.93, 1.65) 0.238 0.71 (0.56, 0.90) 0.020 1.78 (1.01, 3.11) 0.095 1.59 (0.88, 2.86) 0.210 1.34 (1.03, 1.74) 0.072
Food security: Low 0.90 (0.60, 1.35) 0.747 0.63 (0.33, 1.18) 0.238 1.08 (0.31, 3.73) 0.948 1.02 (0.37, 2.77) 0.984 0.97 (0.61, 1.55) 0.948
Food security: Very low 2.53 (1.78, 3.59) <0.001 0.60 (0.36, 0.99) 0.099 3.16 (1.10, 9.07) 0.076 2.91 (0.91, 9.31) 0.136 1.04 (0.48, 2.27) 0.952
Smoking status: Never 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Smoking status: Current 1.48 (1.22, 1.79) <0.001 0.96 (0.76, 1.21) 0.833 2.64 (1.63, 4.27) <0.001 1.04 (0.69, 1.57) 0.910 1.60 (1.28, 2.00) <0.001
Smoking status: Former 1.04 (0.81, 1.34) 0.833 1.26 (0.95, 1.69) 0.190 1.42 (0.70, 2.88) 0.445 0.89 (0.46, 1.73) 0.833 1.48 (1.14, 1.93) 0.014
Physical activity: Sedentary 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Physical activity: Moderate 1.14 (0.94, 1.38) 0.296 1.45 (1.18, 1.78) 0.002 1.48 (0.90, 2.44) 0.210 1.04 (0.69, 1.58) 0.912 1.03 (0.83, 1.28) 0.861
Physical activity: Intense 1.13 (0.97, 1.31) 0.215 1.23 (1.02, 1.47) 0.073 0.52 (0.29, 0.94) 0.073 0.41 (0.27, 0.62) <0.001 0.74 (0.58, 0.93) 0.033
Alcohol use: No 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Alcohol use: Yes 1.26 (1.02, 1.55) 0.073 1.21 (1.01, 1.46) 0.095 0.70 (0.43, 1.13) 0.238 0.63 (0.42, 0.95) 0.069 1.12 (0.93, 1.36) 0.333
Systolic blood pressure category: <120 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Systolic blood pressure category: 120–129 1.12 (0.93, 1.36) 0.336 0.91 (0.74, 1.12) 0.486 0.76 (0.34, 1.72) 0.634 2.88 (1.82, 4.56) <0.001 2.98 (2.31, 3.83) <0.001
Systolic blood pressure category: 130–139 0.91 (0.70, 1.19) 0.634 0.88 (0.60, 1.29) 0.634 1.17 (0.57, 2.43) 0.786 3.29 (1.70, 6.39) 0.003 5.86 (4.21, 8.16) <0.001
Systolic blood pressure category: 140–159 0.94 (0.60, 1.45) 0.843 0.85 (0.59, 1.24) 0.522 0.00 (0.00, 0.00) <0.001 2.55 (1.39, 4.68) 0.011 8.90 (6.12, 12.92) <0.001
Systolic blood pressure category: ≥160 1.24 (0.44, 3.55) 0.799 0.65 (0.18, 2.27) 0.631 0.00 (0.00, 0.00) <0.001 9.29 (1.92, 45.01) 0.020 50.34 (21.90, 115.71) <0.001
Diastolic blood pressure category: <80 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Diastolic blood pressure category: 80–89 1.11 (0.89, 1.39) 0.483 1.31 (0.99, 1.73) 0.116 0.97 (0.48, 1.95) 0.963 2.81 (1.86, 4.24) <0.001 3.14 (2.48, 3.98) <0.001
Diastolic blood pressure category: 90–99 0.89 (0.56, 1.42) 0.757 0.91 (0.57, 1.48) 0.832 2.08 (0.61, 7.14) 0.348 3.82 (1.91, 7.63) 0.001 6.09 (4.47, 8.30) <0.001
Diastolic blood pressure category: ≥100 1.10 (0.58, 2.09) 0.843 0.39 (0.11, 1.37) 0.234 0.00 (0.00, 0.00) <0.001 1.43 (0.30, 6.68) 0.780 13.34 (7.50, 23.73) <0.001
BMI category: Underweight (<18.5) 0.54 (0.30, 0.97) 0.089 0.73 (0.41, 1.31) 0.404 0.27 (0.03, 2.40) 0.348 0.00 (0.00, 0.00) <0.001 0.58 (0.23, 1.42) 0.338
BMI category: Normal (18.5–24.9) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
BMI category: Overweight (25–29.9) 0.98 (0.82, 1.18) 0.896 0.93 (0.71, 1.21) 0.701 1.00 (0.41, 2.45) 0.994 1.71 (0.98, 2.99) 0.116 1.73 (1.29, 2.30) 0.001
BMI category: Obesity I (30–34.9) 1.25 (0.98, 1.58) 0.138 0.87 (0.69, 1.11) 0.370 2.02 (0.90, 4.51) 0.157 2.07 (1.10, 3.89) 0.064 3.33 (2.60, 4.27) <0.001
BMI category: Obesity II (35–39.9) 1.46 (1.09, 1.97) 0.038 1.00 (0.70, 1.42) 0.994 2.39 (1.28, 4.47) 0.021 4.71 (2.29, 9.71) <0.001 4.89 (3.71, 6.44) <0.001
BMI category: Obesity III (≥40) 1.63 (1.19, 2.22) 0.010 0.91 (0.60, 1.38) 0.783 3.12 (1.34, 7.28) 0.028 10.89 (6.50, 18.26) <0.001 7.04 (5.26, 9.41) <0.001
Waist-to-height ratio category: <0.5 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Waist-to-height ratio category: 0.5–0.59 0.89 (0.74, 1.08) 0.363 1.00 (0.82, 1.22) 0.994 1.13 (0.52, 2.47) 0.837 1.03 (0.52, 2.02) 0.968 2.01 (1.56, 2.59) <0.001
Waist-to-height ratio category: ≥0.6 1.32 (1.08, 1.61) 0.021 1.02 (0.81, 1.29) 0.913 1.97 (0.96, 4.04) 0.125 5.14 (2.75, 9.63) <0.001 4.87 (3.89, 6.10) <0.001
HDL cholesterol (mg/dL) 1.00 (1.00, 1.01) 0.833 1.00 (1.00, 1.01) 0.445 1.00 (0.98, 1.02) 0.970 0.98 (0.97, 1.00) 0.082 0.97 (0.96, 0.98) <0.001
Non-HDL cholesterol (mg/dL) 1.00 (1.00, 1.00) 0.017 1.00 (1.00, 1.00) 0.531 1.00 (1.00, 1.01) 0.333 1.01 (1.01, 1.01) <0.001 1.01 (1.01, 1.01) <0.001
Total cholesterol (mg/dL) 1.00 (1.00, 1.00) 0.010 1.00 (1.00, 1.00) 0.250 1.00 (1.00, 1.01) 0.288 1.01 (1.00, 1.01) 0.002 1.01 (1.00, 1.01) <0.001
HbA1c (%) 0.98 (0.88, 1.10) 0.845 1.08 (0.97, 1.21) 0.268 1.17 (1.00, 1.37) 0.114 4.52 (3.18, 6.44) <0.001 1.50 (1.38, 1.63) <0.001
Vitamin D category: <20 (deficient) 0.87 (0.51, 1.47) 0.725 0.66 (0.32, 1.37) 0.369 1.14 (0.26, 5.00) 0.918 0.61 (0.14, 2.62) 0.634 1.79 (1.05, 3.04) 0.073
Vitamin D category: 20–29 (insufficient) 0.99 (0.72, 1.36) 0.970 0.69 (0.47, 1.02) 0.116 0.87 (0.41, 1.88) 0.833 1.81 (1.12, 2.92) 0.046 1.18 (0.87, 1.60) 0.381
Vitamin D category: ≥30 (sufficient) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Vitamin D category trend 1.04 (0.84, 1.29) 0.833 1.35 (1.03, 1.77) 0.069 1.04 (0.57, 1.91) 0.948 0.76 (0.54, 1.07) 0.197 0.79 (0.63, 1.00) 0.099
DII 1.01 (0.98, 1.05) 0.606 0.96 (0.92, 0.99) 0.050 1.15 (1.01, 1.30) 0.089 1.05 (0.97, 1.14) 0.327 1.05 (1.01, 1.09) 0.050
DII quartile trend 1.03 (0.96, 1.10) 0.606 0.93 (0.86, 1.00) 0.095 1.30 (1.00, 1.68) 0.099 1.11 (0.95, 1.30) 0.300 1.09 (1.00, 1.18) 0.099
DII including alcohol 1.01 (0.98, 1.04) 0.634 0.96 (0.92, 0.99) 0.044 1.15 (1.01, 1.31) 0.072 1.06 (0.98, 1.14) 0.210 1.05 (1.01, 1.08) 0.049
1 p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method
tbl_uni1_secondary
Table 6: Univariable Associations of Core Covariates With Eczema and Food Allergy
Variable Atopic dermatitis OR (95% CI) Atopic dermatitis p value1 Food allergy OR (95% CI) Food allergy p value1
Age (years) 0.99 (0.94, 1.04) 0.837 1.00 (0.99, 1.02) 0.775
Sex: Male 1 (Ref) 1 (Ref)
Sex: Female 1.68 (0.95, 2.98) 0.271 1.40 (1.09, 1.79) 0.088
Race/ethnicity: Non-Hispanic White 1 (Ref) 1 (Ref)
Race/ethnicity: Non-Hispanic Black 1.00 (0.55, 1.83) 0.993 1.57 (1.15, 2.14) 0.058
Race/ethnicity: Mexican American 0.19 (0.08, 0.44) 0.013 0.62 (0.48, 0.81) 0.013
Race/ethnicity: Other Hispanic 0.21 (0.06, 0.74) 0.097 1.41 (0.94, 2.13) 0.316
Insurance: No 1 (Ref) 1 (Ref)
Insurance: Yes 1.57 (0.90, 2.74) 0.316 1.33 (0.94, 1.87) 0.316
Education: <9 years 0.10 (0.06, 0.18) <0.001 0.47 (0.23, 0.96) 0.169
Education: 9–12 years 0.99 (0.54, 1.82) 0.993 0.64 (0.50, 0.83) 0.013
Education: >12 years 1 (Ref) 1 (Ref)
Poverty-income ratio 1.05 (0.92, 1.20) 0.698 1.02 (0.95, 1.11) 0.770
Food security: Full 1 (Ref) 1 (Ref)
Food security: Marginal 1.54 (0.95, 2.50) 0.271 0.78 (0.50, 1.22) 0.571
Food security: Low 0.95 (0.36, 2.54) 0.993 NA NA
Food security: Very low 1.46 (0.60, 3.52) 0.659 NA NA
Smoking status: Never 1 (Ref) 1 (Ref)
Smoking status: Current 1.69 (1.02, 2.79) 0.179 0.94 (0.70, 1.26) 0.837
Smoking status: Former 1.06 (0.37, 3.04) 0.993 1.27 (0.88, 1.83) 0.502
Physical activity: Sedentary 1 (Ref) 1 (Ref)
Physical activity: Moderate 1.30 (0.50, 3.34) 0.775 1.23 (0.84, 1.80) 0.577
Physical activity: Intense 1.28 (0.57, 2.87) 0.770 1.16 (0.83, 1.61) 0.659
Alcohol use: No 1 (Ref) 1 (Ref)
Alcohol use: Yes 1.58 (0.68, 3.67) 0.571 0.77 (0.60, 1.00) 0.186
Systolic blood pressure category: <120 1 (Ref) 1 (Ref)
Systolic blood pressure category: 120–129 1.05 (0.56, 1.97) 0.988 0.59 (0.38, 0.91) 0.097
Systolic blood pressure category: 130–139 1.02 (0.37, 2.78) 0.993 1.05 (0.61, 1.80) 0.988
Systolic blood pressure category: 140–159 0.97 (0.25, 3.75) 0.993 0.67 (0.30, 1.49) 0.617
Systolic blood pressure category: ≥160 0.00 (0.00, 0.00) <0.001 0.00 (0.00, 0.00) <0.001
Diastolic blood pressure category: <80 1 (Ref) 1 (Ref)
Diastolic blood pressure category: 80–89 1.60 (0.60, 4.27) 0.617 0.98 (0.65, 1.49) 0.993
Diastolic blood pressure category: 90–99 0.44 (0.10, 1.99) 0.571 1.60 (0.89, 2.86) 0.316
Diastolic blood pressure category: ≥100 0.00 (0.00, 0.00) <0.001 1.69 (0.42, 6.86) 0.717
BMI category: Underweight (<18.5) 0.39 (0.07, 2.21) 0.571 1.00 (0.35, 2.80) 0.993
BMI category: Normal (18.5–24.9) 1 (Ref) 1 (Ref)
BMI category: Overweight (25–29.9) 0.51 (0.30, 0.86) 0.097 0.94 (0.68, 1.30) 0.847
BMI category: Obesity I (30–34.9) 0.81 (0.41, 1.60) 0.755 1.04 (0.71, 1.52) 0.951
BMI category: Obesity II (35–39.9) 0.81 (0.33, 2.00) 0.801 0.98 (0.56, 1.69) 0.993
BMI category: Obesity III (≥40) 2.22 (1.24, 3.97) 0.088 1.20 (0.76, 1.90) 0.698
Waist-to-height ratio category: <0.5 1 (Ref) 1 (Ref)
Waist-to-height ratio category: 0.5–0.59 0.81 (0.44, 1.49) 0.725 0.87 (0.62, 1.20) 0.659
Waist-to-height ratio category: ≥0.6 1.22 (0.72, 2.09) 0.698 1.04 (0.73, 1.49) 0.951
HDL cholesterol (mg/dL) 1.00 (0.99, 1.01) 0.725 1.00 (0.99, 1.01) 0.775
Non-HDL cholesterol (mg/dL) 1.00 (0.99, 1.01) 0.770 1.00 (0.99, 1.00) 0.329
Total cholesterol (mg/dL) 1.00 (0.99, 1.01) 0.775 1.00 (0.99, 1.00) 0.316
HbA1c (%) 0.71 (0.38, 1.32) 0.571 0.92 (0.76, 1.11) 0.659
Vitamin D category: <20 (deficient) 1.54 (0.13, 17.72) 0.847 2.04 (0.92, 4.51) 0.271
Vitamin D category: 20–29 (insufficient) 0.79 (0.34, 1.85) 0.775 1.65 (1.11, 2.45) 0.097
Vitamin D category: ≥30 (sufficient) 1 (Ref) 1 (Ref)
Vitamin D category trend 1.14 (0.59, 2.17) 0.837 0.66 (0.48, 0.91) 0.088
DII 1.07 (0.96, 1.19) 0.560 0.95 (0.91, 0.99) 0.097
DII quartile trend 1.12 (0.85, 1.47) 0.698 0.93 (0.84, 1.03) 0.397
DII including alcohol 1.06 (0.94, 1.19) 0.617 0.95 (0.92, 0.99) 0.117
1 p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method
tbl_uni2_primary
Table 7: Univariable Associations of Individual Nutritional Covariates With Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension
Variable Asthma OR (95% CI) Asthma p value1 Allergic rhinitis OR (95% CI) Allergic rhinitis p value1 Inflammatory arthritis OR (95% CI) Inflammatory arthritis p value1 Diabetes OR (95% CI) Diabetes p value1 Hypertension OR (95% CI) Hypertension p value1
Total energy intake (kcal): Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Total energy intake (kcal): Q2 0.92 (0.75, 1.14) 0.781 1.27 (1.00, 1.62) 0.366 0.37 (0.19, 0.73) 0.100 0.83 (0.51, 1.35) 0.781 1.16 (0.89, 1.53) 0.646
Total energy intake (kcal): Q3 0.74 (0.59, 0.93) 0.167 1.05 (0.85, 1.30) 0.856 0.31 (0.16, 0.60) 0.047 0.78 (0.48, 1.27) 0.655 1.08 (0.85, 1.39) 0.821
Total energy intake (kcal): Q4 1.01 (0.80, 1.26) 0.968 1.22 (0.95, 1.57) 0.524 0.51 (0.25, 1.01) 0.373 0.66 (0.36, 1.19) 0.555 1.28 (1.01, 1.61) 0.308
Carbohydrate intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Carbohydrate intake: Q2 0.92 (0.72, 1.18) 0.821 1.35 (1.06, 1.74) 0.213 0.76 (0.40, 1.45) 0.750 0.63 (0.39, 1.00) 0.371 1.14 (0.89, 1.46) 0.663
Carbohydrate intake: Q3 0.83 (0.67, 1.03) 0.483 1.18 (0.92, 1.51) 0.574 0.26 (0.13, 0.49) 0.019 0.44 (0.25, 0.80) 0.133 0.99 (0.78, 1.27) 0.971
Carbohydrate intake: Q4 1.04 (0.82, 1.32) 0.883 1.13 (0.87, 1.47) 0.703 0.73 (0.37, 1.45) 0.725 0.37 (0.21, 0.68) 0.062 1.12 (0.90, 1.39) 0.653
Protein intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Protein intake: Q2 0.84 (0.67, 1.06) 0.531 1.15 (0.87, 1.53) 0.665 0.49 (0.27, 0.90) 0.237 1.23 (0.75, 2.01) 0.763 0.99 (0.79, 1.25) 0.968
Protein intake: Q3 0.89 (0.70, 1.11) 0.653 1.23 (0.96, 1.59) 0.511 0.54 (0.28, 1.02) 0.382 1.22 (0.72, 2.07) 0.781 1.14 (0.90, 1.44) 0.653
Protein intake: Q4 0.80 (0.64, 1.02) 0.419 1.24 (0.94, 1.63) 0.524 0.50 (0.26, 0.94) 0.291 1.15 (0.74, 1.78) 0.821 1.18 (0.95, 1.46) 0.524
Total fat intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Total fat intake: Q2 1.04 (0.81, 1.34) 0.883 1.03 (0.79, 1.35) 0.900 0.48 (0.24, 0.95) 0.295 0.69 (0.41, 1.16) 0.555 0.89 (0.67, 1.17) 0.750
Total fat intake: Q3 0.96 (0.74, 1.25) 0.883 1.09 (0.87, 1.35) 0.781 0.51 (0.30, 0.87) 0.208 0.77 (0.43, 1.38) 0.727 1.11 (0.89, 1.39) 0.714
Total fat intake: Q4 1.06 (0.83, 1.35) 0.856 1.20 (0.94, 1.53) 0.524 0.60 (0.29, 1.21) 0.534 1.23 (0.73, 2.07) 0.781 1.30 (1.02, 1.66) 0.295
Saturated fat intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Saturated fat intake: Q2 0.94 (0.77, 1.14) 0.816 1.09 (0.86, 1.39) 0.781 0.76 (0.38, 1.54) 0.781 0.73 (0.43, 1.23) 0.610 1.01 (0.78, 1.31) 0.971
Saturated fat intake: Q3 0.96 (0.76, 1.23) 0.883 1.12 (0.88, 1.42) 0.703 0.64 (0.33, 1.25) 0.579 0.80 (0.50, 1.28) 0.703 1.01 (0.79, 1.29) 0.968
Saturated fat intake: Q4 0.97 (0.76, 1.24) 0.898 1.21 (0.94, 1.56) 0.531 0.82 (0.42, 1.57) 0.821 0.99 (0.59, 1.66) 0.971 1.29 (1.04, 1.60) 0.218
Polyunsaturated fat intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Polyunsaturated fat intake: Q2 0.94 (0.74, 1.20) 0.856 0.96 (0.75, 1.23) 0.877 0.73 (0.36, 1.48) 0.740 0.86 (0.47, 1.56) 0.852 0.94 (0.77, 1.16) 0.844
Polyunsaturated fat intake: Q3 0.97 (0.79, 1.19) 0.883 0.90 (0.70, 1.16) 0.750 0.86 (0.39, 1.90) 0.877 0.66 (0.39, 1.11) 0.524 0.97 (0.77, 1.21) 0.883
Polyunsaturated fat intake: Q4 1.16 (0.95, 1.43) 0.531 0.96 (0.77, 1.19) 0.875 1.73 (0.86, 3.51) 0.524 0.89 (0.53, 1.50) 0.860 0.85 (0.67, 1.08) 0.569
Monounsaturated fat intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Monounsaturated fat intake: Q2 0.88 (0.72, 1.09) 0.610 0.84 (0.65, 1.09) 0.578 0.64 (0.32, 1.29) 0.600 0.62 (0.35, 1.11) 0.511 0.87 (0.68, 1.10) 0.614
Monounsaturated fat intake: Q3 1.08 (0.89, 1.31) 0.781 0.86 (0.66, 1.12) 0.638 0.71 (0.39, 1.32) 0.646 0.49 (0.28, 0.83) 0.148 0.68 (0.53, 0.87) 0.082
Monounsaturated fat intake: Q4 1.03 (0.82, 1.29) 0.898 0.83 (0.64, 1.06) 0.524 1.58 (0.84, 2.95) 0.531 0.71 (0.42, 1.20) 0.586 0.81 (0.65, 1.01) 0.407
Omega-3 fat intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Omega-3 fat intake: Q2 0.98 (0.79, 1.21) 0.900 1.07 (0.86, 1.32) 0.834 1.61 (0.73, 3.54) 0.610 1.10 (0.61, 1.97) 0.883 0.95 (0.77, 1.16) 0.852
Omega-3 fat intake: Q3 0.90 (0.74, 1.10) 0.653 0.98 (0.78, 1.23) 0.916 0.76 (0.32, 1.83) 0.821 1.26 (0.70, 2.28) 0.781 1.02 (0.85, 1.24) 0.898
Omega-3 fat intake: Q4 1.11 (0.92, 1.35) 0.646 0.86 (0.67, 1.11) 0.614 2.88 (1.29, 6.41) 0.157 1.46 (0.89, 2.38) 0.524 0.96 (0.76, 1.21) 0.877
Omega-6 fat intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Omega-6 fat intake: Q2 0.94 (0.73, 1.21) 0.856 0.97 (0.75, 1.24) 0.892 0.81 (0.38, 1.75) 0.852 0.82 (0.45, 1.49) 0.821 0.84 (0.67, 1.04) 0.524
Omega-6 fat intake: Q3 0.98 (0.78, 1.22) 0.913 0.86 (0.68, 1.09) 0.600 0.95 (0.46, 1.98) 0.943 0.68 (0.41, 1.14) 0.531 0.90 (0.71, 1.14) 0.725
Omega-6 fat intake: Q4 1.13 (0.91, 1.40) 0.647 0.99 (0.79, 1.23) 0.957 1.69 (0.85, 3.37) 0.524 0.86 (0.52, 1.43) 0.834 0.84 (0.67, 1.06) 0.527
Cholesterol intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Cholesterol intake: Q2 0.99 (0.78, 1.26) 0.968 1.15 (0.89, 1.48) 0.653 0.71 (0.39, 1.29) 0.634 0.86 (0.44, 1.68) 0.861 0.96 (0.74, 1.26) 0.893
Cholesterol intake: Q3 0.83 (0.67, 1.03) 0.483 0.97 (0.78, 1.22) 0.898 0.61 (0.31, 1.20) 0.531 1.26 (0.73, 2.18) 0.748 1.19 (0.95, 1.50) 0.524
Cholesterol intake: Q4 0.81 (0.63, 1.03) 0.483 1.07 (0.82, 1.39) 0.852 0.63 (0.33, 1.22) 0.555 1.30 (0.75, 2.27) 0.703 1.39 (1.11, 1.73) 0.095
Vitamin A intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Vitamin A intake: Q2 0.92 (0.74, 1.16) 0.808 0.82 (0.65, 1.04) 0.511 0.91 (0.44, 1.87) 0.896 1.47 (0.84, 2.57) 0.569 1.08 (0.83, 1.40) 0.842
Vitamin A intake: Q3 0.80 (0.65, 1.00) 0.373 0.72 (0.57, 0.91) 0.125 0.90 (0.45, 1.82) 0.886 1.91 (1.13, 3.26) 0.213 1.16 (0.88, 1.54) 0.653
Vitamin A intake: Q4 1.05 (0.84, 1.31) 0.856 0.60 (0.46, 0.77) 0.019 1.22 (0.58, 2.55) 0.852 1.66 (0.86, 3.19) 0.524 1.23 (0.95, 1.60) 0.524
Beta-carotene intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Beta-carotene intake: Q2 1.02 (0.83, 1.26) 0.902 0.82 (0.64, 1.05) 0.524 0.76 (0.39, 1.45) 0.750 1.60 (0.93, 2.75) 0.481 1.08 (0.85, 1.38) 0.821
Beta-carotene intake: Q3 1.06 (0.83, 1.36) 0.856 0.65 (0.51, 0.83) 0.047 1.47 (0.72, 3.02) 0.653 1.54 (0.85, 2.78) 0.531 1.25 (0.97, 1.61) 0.477
Beta-carotene intake: Q4 1.11 (0.89, 1.39) 0.703 0.70 (0.56, 0.88) 0.081 2.05 (1.05, 3.99) 0.295 1.41 (0.82, 2.41) 0.596 1.34 (1.02, 1.76) 0.295
Vitamin C intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Vitamin C intake: Q2 1.17 (0.92, 1.49) 0.590 1.27 (1.03, 1.56) 0.250 1.20 (0.52, 2.80) 0.861 1.44 (0.89, 2.33) 0.528 1.09 (0.83, 1.44) 0.821
Vitamin C intake: Q3 1.23 (1.02, 1.48) 0.291 0.93 (0.69, 1.26) 0.856 1.92 (0.99, 3.74) 0.373 1.06 (0.60, 1.88) 0.900 1.14 (0.86, 1.51) 0.713
Vitamin C intake: Q4 1.23 (1.00, 1.51) 0.371 0.90 (0.68, 1.19) 0.781 2.46 (1.12, 5.40) 0.255 1.18 (0.69, 2.00) 0.821 1.31 (0.98, 1.75) 0.429
Vitamin E intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Vitamin E intake: Q2 0.82 (0.64, 1.05) 0.524 0.97 (0.78, 1.21) 0.898 0.93 (0.50, 1.73) 0.900 0.56 (0.31, 1.01) 0.373 1.08 (0.80, 1.46) 0.852
Vitamin E intake: Q3 0.94 (0.74, 1.20) 0.856 0.82 (0.65, 1.03) 0.483 1.01 (0.53, 1.94) 0.981 0.83 (0.46, 1.50) 0.821 1.10 (0.85, 1.42) 0.781
Vitamin E intake: Q4 0.95 (0.76, 1.20) 0.871 0.67 (0.54, 0.82) 0.024 1.53 (0.77, 3.04) 0.602 1.10 (0.69, 1.77) 0.868 1.15 (0.89, 1.48) 0.646
Vitamin B6 intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Vitamin B6 intake: Q2 0.93 (0.78, 1.11) 0.745 0.97 (0.77, 1.22) 0.898 0.77 (0.35, 1.67) 0.811 1.18 (0.70, 1.99) 0.821 0.88 (0.68, 1.13) 0.653
Vitamin B6 intake: Q3 0.96 (0.78, 1.18) 0.877 0.90 (0.70, 1.16) 0.759 1.51 (0.78, 2.89) 0.600 1.24 (0.77, 1.99) 0.727 1.04 (0.84, 1.29) 0.875
Vitamin B6 intake: Q4 1.10 (0.89, 1.36) 0.725 0.82 (0.66, 1.03) 0.477 2.27 (1.13, 4.56) 0.242 1.17 (0.65, 2.08) 0.852 1.15 (0.91, 1.45) 0.610
Vitamin B12 intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Vitamin B12 intake: Q2 1.02 (0.82, 1.27) 0.916 1.11 (0.90, 1.38) 0.686 0.87 (0.47, 1.59) 0.856 0.74 (0.44, 1.22) 0.610 1.01 (0.75, 1.35) 0.974
Vitamin B12 intake: Q3 0.87 (0.69, 1.09) 0.600 1.03 (0.78, 1.36) 0.898 1.07 (0.59, 1.93) 0.898 0.67 (0.40, 1.12) 0.524 0.93 (0.72, 1.19) 0.821
Vitamin B12 intake: Q4 1.03 (0.83, 1.28) 0.898 1.20 (0.94, 1.52) 0.524 0.59 (0.27, 1.32) 0.585 0.76 (0.42, 1.36) 0.703 1.15 (0.90, 1.48) 0.646
Folic acid intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Folic acid intake: Q2 1.28 (1.03, 1.58) 0.250 0.92 (0.72, 1.16) 0.787 0.96 (0.46, 2.01) 0.957 0.80 (0.51, 1.23) 0.653 1.20 (0.92, 1.57) 0.559
Folic acid intake: Q3 1.05 (0.85, 1.30) 0.856 0.91 (0.72, 1.15) 0.781 1.33 (0.63, 2.82) 0.781 1.22 (0.72, 2.09) 0.781 1.21 (0.96, 1.53) 0.509
Folic acid intake: Q4 1.23 (0.97, 1.55) 0.477 0.98 (0.77, 1.24) 0.907 1.61 (0.79, 3.31) 0.579 1.23 (0.70, 2.15) 0.786 1.44 (1.16, 1.80) 0.058
Thiamin intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Thiamin intake: Q2 0.96 (0.75, 1.23) 0.883 0.93 (0.73, 1.19) 0.842 1.35 (0.58, 3.13) 0.793 1.03 (0.67, 1.60) 0.935 1.04 (0.84, 1.28) 0.877
Thiamin intake: Q3 0.87 (0.70, 1.08) 0.589 0.96 (0.76, 1.20) 0.877 1.21 (0.54, 2.68) 0.856 0.96 (0.57, 1.61) 0.918 1.07 (0.86, 1.32) 0.821
Thiamin intake: Q4 1.17 (0.92, 1.49) 0.585 0.81 (0.64, 1.03) 0.477 2.69 (1.20, 6.01) 0.213 1.12 (0.62, 2.03) 0.875 1.05 (0.85, 1.30) 0.860
Riboflavin intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Riboflavin intake: Q2 0.82 (0.67, 1.01) 0.376 0.85 (0.67, 1.07) 0.555 1.04 (0.51, 2.13) 0.957 1.05 (0.69, 1.60) 0.898 1.05 (0.85, 1.30) 0.859
Riboflavin intake: Q3 0.96 (0.75, 1.22) 0.877 0.90 (0.73, 1.10) 0.653 1.28 (0.63, 2.61) 0.811 1.03 (0.57, 1.86) 0.958 1.04 (0.81, 1.34) 0.883
Riboflavin intake: Q4 1.03 (0.84, 1.27) 0.892 0.61 (0.46, 0.80) 0.047 1.60 (0.80, 3.19) 0.571 1.10 (0.64, 1.88) 0.877 1.05 (0.80, 1.37) 0.883
Niacin intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Niacin intake: Q2 1.05 (0.83, 1.33) 0.860 1.08 (0.85, 1.37) 0.821 0.82 (0.38, 1.78) 0.856 1.58 (0.93, 2.69) 0.483 1.07 (0.83, 1.37) 0.852
Niacin intake: Q3 1.02 (0.82, 1.26) 0.935 1.16 (0.93, 1.45) 0.571 1.50 (0.76, 2.94) 0.614 1.42 (0.78, 2.58) 0.614 1.03 (0.85, 1.24) 0.893
Niacin intake: Q4 1.16 (0.94, 1.43) 0.544 0.94 (0.73, 1.20) 0.852 1.41 (0.74, 2.68) 0.653 1.11 (0.63, 1.96) 0.877 1.02 (0.83, 1.24) 0.916
Iron intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Iron intake: Q2 0.85 (0.70, 1.03) 0.483 1.01 (0.80, 1.28) 0.957 0.73 (0.43, 1.24) 0.614 1.00 (0.58, 1.73) 0.996 1.04 (0.83, 1.31) 0.877
Iron intake: Q3 0.88 (0.71, 1.10) 0.638 1.13 (0.88, 1.44) 0.686 0.44 (0.22, 0.90) 0.255 0.78 (0.43, 1.41) 0.759 1.01 (0.82, 1.25) 0.962
Iron intake: Q4 0.84 (0.67, 1.05) 0.524 1.20 (0.93, 1.56) 0.544 0.57 (0.28, 1.19) 0.524 0.89 (0.48, 1.65) 0.877 0.92 (0.74, 1.15) 0.787
Magnesium intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Magnesium intake: Q2 0.91 (0.72, 1.16) 0.781 0.95 (0.72, 1.26) 0.877 1.28 (0.54, 3.05) 0.842 0.85 (0.43, 1.69) 0.856 0.96 (0.74, 1.25) 0.883
Magnesium intake: Q3 0.94 (0.75, 1.17) 0.844 0.87 (0.69, 1.09) 0.600 1.79 (0.83, 3.88) 0.524 1.16 (0.69, 1.95) 0.844 0.95 (0.76, 1.19) 0.868
Magnesium intake: Q4 1.16 (0.93, 1.44) 0.571 0.72 (0.55, 0.94) 0.213 2.88 (1.34, 6.18) 0.131 1.23 (0.70, 2.14) 0.786 1.04 (0.80, 1.36) 0.883
Zinc intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Zinc intake: Q2 0.94 (0.75, 1.18) 0.852 1.03 (0.82, 1.30) 0.883 0.96 (0.46, 2.01) 0.957 0.75 (0.39, 1.45) 0.741 0.66 (0.51, 0.86) 0.082
Zinc intake: Q3 1.14 (0.90, 1.43) 0.646 1.08 (0.85, 1.36) 0.821 0.80 (0.38, 1.68) 0.826 0.89 (0.45, 1.75) 0.877 0.81 (0.67, 0.99) 0.295
Zinc intake: Q4 1.09 (0.87, 1.37) 0.781 0.91 (0.70, 1.18) 0.781 1.53 (0.77, 3.02) 0.602 1.13 (0.66, 1.94) 0.859 0.88 (0.71, 1.09) 0.608
Selenium intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Selenium intake: Q2 1.23 (0.99, 1.53) 0.391 1.17 (0.89, 1.52) 0.632 1.17 (0.54, 2.54) 0.875 1.37 (0.79, 2.37) 0.634 1.05 (0.82, 1.33) 0.877
Selenium intake: Q3 1.10 (0.89, 1.36) 0.725 0.88 (0.70, 1.11) 0.653 1.05 (0.51, 2.18) 0.935 1.47 (0.89, 2.42) 0.524 0.97 (0.78, 1.20) 0.883
Selenium intake: Q4 1.39 (1.11, 1.73) 0.100 0.97 (0.77, 1.23) 0.896 2.11 (1.07, 4.14) 0.291 1.09 (0.63, 1.87) 0.883 0.87 (0.69, 1.11) 0.632
Fiber intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Fiber intake: Q2 0.95 (0.75, 1.20) 0.870 0.92 (0.72, 1.19) 0.821 1.25 (0.50, 3.18) 0.856 1.00 (0.58, 1.71) 0.990 1.18 (0.92, 1.53) 0.579
Fiber intake: Q3 1.00 (0.80, 1.27) 0.978 0.80 (0.65, 0.98) 0.295 1.23 (0.52, 2.91) 0.856 1.17 (0.67, 2.05) 0.842 1.09 (0.85, 1.41) 0.811
Fiber intake: Q4 1.18 (0.92, 1.53) 0.579 0.69 (0.54, 0.88) 0.094 3.02 (1.32, 6.94) 0.156 1.19 (0.68, 2.09) 0.821 1.45 (1.16, 1.82) 0.058
Caffeine intake: Q1 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref) 1 (Ref)
Caffeine intake: Q2 0.78 (0.64, 0.96) 0.213 0.83 (0.66, 1.04) 0.509 0.38 (0.19, 0.75) 0.125 0.87 (0.48, 1.57) 0.856 0.85 (0.65, 1.11) 0.603
Caffeine intake: Q3 0.72 (0.58, 0.89) 0.081 0.47 (0.37, 0.60) <0.001 0.73 (0.41, 1.29) 0.646 0.98 (0.59, 1.63) 0.963 0.88 (0.65, 1.18) 0.734
Caffeine intake: Q4 0.95 (0.76, 1.18) 0.856 0.64 (0.50, 0.83) 0.047 0.46 (0.22, 0.96) 0.295 1.21 (0.72, 2.04) 0.786 0.88 (0.69, 1.12) 0.653
1 p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method
tbl_uni2_secondary
Table 8: Univariable Associations of Individual Nutritional Covariates With Eczema and Food Allergy
Variable Atopic dermatitis OR (95% CI) Atopic dermatitis p value1 Food allergy OR (95% CI) Food allergy p value1
Total energy intake (kcal): Q1 1 (Ref) 1 (Ref)
Total energy intake (kcal): Q2 1.34 (0.71, 2.52) 0.905 1.09 (0.69, 1.71) 0.921
Total energy intake (kcal): Q3 0.85 (0.43, 1.67) 0.921 1.05 (0.74, 1.49) 0.921
Total energy intake (kcal): Q4 0.78 (0.45, 1.36) 0.905 1.04 (0.69, 1.57) 0.939
Carbohydrate intake: Q1 1 (Ref) 1 (Ref)
Carbohydrate intake: Q2 1.42 (0.57, 3.51) 0.909 1.13 (0.76, 1.66) 0.921
Carbohydrate intake: Q3 1.36 (0.53, 3.47) 0.921 1.00 (0.63, 1.57) 0.990
Carbohydrate intake: Q4 0.84 (0.51, 1.39) 0.921 1.16 (0.79, 1.71) 0.909
Protein intake: Q1 1 (Ref) 1 (Ref)
Protein intake: Q2 0.95 (0.45, 2.04) 0.942 1.28 (0.85, 1.92) 0.892
Protein intake: Q3 0.76 (0.33, 1.72) 0.921 1.42 (0.98, 2.05) 0.727
Protein intake: Q4 0.67 (0.36, 1.25) 0.892 0.92 (0.61, 1.38) 0.921
Total fat intake: Q1 1 (Ref) 1 (Ref)
Total fat intake: Q2 0.74 (0.30, 1.85) 0.921 1.11 (0.71, 1.75) 0.921
Total fat intake: Q3 0.93 (0.44, 1.93) 0.937 1.08 (0.68, 1.72) 0.921
Total fat intake: Q4 0.62 (0.38, 1.04) 0.727 1.11 (0.76, 1.60) 0.921
Saturated fat intake: Q1 1 (Ref) 1 (Ref)
Saturated fat intake: Q2 0.77 (0.39, 1.52) 0.909 1.04 (0.76, 1.43) 0.921
Saturated fat intake: Q3 0.87 (0.47, 1.62) 0.921 1.17 (0.76, 1.80) 0.921
Saturated fat intake: Q4 0.74 (0.42, 1.30) 0.905 0.86 (0.59, 1.26) 0.909
Polyunsaturated fat intake: Q1 1 (Ref) 1 (Ref)
Polyunsaturated fat intake: Q2 0.65 (0.31, 1.34) 0.892 0.85 (0.60, 1.21) 0.905
Polyunsaturated fat intake: Q3 1.05 (0.51, 2.16) 0.942 0.91 (0.64, 1.31) 0.921
Polyunsaturated fat intake: Q4 1.16 (0.67, 1.99) 0.921 0.71 (0.53, 0.96) 0.641
Monounsaturated fat intake: Q1 1 (Ref) 1 (Ref)
Monounsaturated fat intake: Q2 1.52 (0.73, 3.16) 0.892 1.09 (0.70, 1.69) 0.921
Monounsaturated fat intake: Q3 1.50 (0.75, 2.97) 0.892 1.16 (0.76, 1.76) 0.921
Monounsaturated fat intake: Q4 1.51 (1.00, 2.29) 0.669 0.98 (0.69, 1.38) 0.942
Omega-3 fat intake: Q1 1 (Ref) 1 (Ref)
Omega-3 fat intake: Q2 0.74 (0.30, 1.85) 0.921 1.18 (0.91, 1.54) 0.892
Omega-3 fat intake: Q3 1.33 (0.70, 2.51) 0.905 0.94 (0.69, 1.30) 0.921
Omega-3 fat intake: Q4 1.21 (0.77, 1.90) 0.905 0.93 (0.63, 1.36) 0.921
Omega-6 fat intake: Q1 1 (Ref) 1 (Ref)
Omega-6 fat intake: Q2 0.90 (0.46, 1.75) 0.921 0.77 (0.55, 1.07) 0.802
Omega-6 fat intake: Q3 1.24 (0.63, 2.42) 0.921 0.82 (0.55, 1.20) 0.905
Omega-6 fat intake: Q4 1.35 (0.80, 2.29) 0.892 0.72 (0.55, 0.96) 0.641
Cholesterol intake: Q1 1 (Ref) 1 (Ref)
Cholesterol intake: Q2 0.81 (0.37, 1.73) 0.921 1.34 (0.87, 2.06) 0.892
Cholesterol intake: Q3 0.63 (0.30, 1.35) 0.892 0.95 (0.64, 1.40) 0.921
Cholesterol intake: Q4 0.70 (0.41, 1.19) 0.892 1.11 (0.72, 1.71) 0.921
Vitamin A intake: Q1 1 (Ref) 1 (Ref)
Vitamin A intake: Q2 0.87 (0.49, 1.55) 0.921 0.86 (0.61, 1.21) 0.907
Vitamin A intake: Q3 1.09 (0.62, 1.91) 0.921 0.89 (0.60, 1.33) 0.921
Vitamin A intake: Q4 1.01 (0.50, 2.01) 0.990 0.95 (0.67, 1.35) 0.921
Beta-carotene intake: Q1 1 (Ref) 1 (Ref)
Beta-carotene intake: Q2 1.40 (0.76, 2.60) 0.905 0.69 (0.48, 0.99) 0.669
Beta-carotene intake: Q3 0.83 (0.36, 1.91) 0.921 0.82 (0.66, 1.04) 0.802
Beta-carotene intake: Q4 1.21 (0.62, 2.35) 0.921 0.73 (0.51, 1.04) 0.727
Vitamin C intake: Q1 1 (Ref) 1 (Ref)
Vitamin C intake: Q2 1.63 (0.70, 3.83) 0.892 0.75 (0.52, 1.07) 0.802
Vitamin C intake: Q3 1.31 (0.55, 3.11) 0.921 0.66 (0.44, 0.98) 0.669
Vitamin C intake: Q4 1.26 (0.74, 2.15) 0.905 0.80 (0.52, 1.22) 0.905
Vitamin E intake: Q1 1 (Ref) 1 (Ref)
Vitamin E intake: Q2 0.92 (0.47, 1.81) 0.921 1.02 (0.72, 1.45) 0.942
Vitamin E intake: Q3 1.04 (0.50, 2.18) 0.942 0.94 (0.65, 1.37) 0.921
Vitamin E intake: Q4 0.85 (0.46, 1.56) 0.921 0.71 (0.52, 0.97) 0.641
Vitamin B6 intake: Q1 1 (Ref) 1 (Ref)
Vitamin B6 intake: Q2 0.88 (0.36, 2.14) 0.921 0.98 (0.69, 1.38) 0.942
Vitamin B6 intake: Q3 1.50 (0.83, 2.72) 0.892 1.33 (0.89, 1.99) 0.892
Vitamin B6 intake: Q4 1.34 (0.58, 3.08) 0.921 1.08 (0.68, 1.73) 0.921
Vitamin B12 intake: Q1 1 (Ref) 1 (Ref)
Vitamin B12 intake: Q2 0.77 (0.33, 1.83) 0.921 0.93 (0.68, 1.26) 0.921
Vitamin B12 intake: Q3 0.64 (0.28, 1.48) 0.905 0.83 (0.59, 1.19) 0.905
Vitamin B12 intake: Q4 0.99 (0.46, 2.13) 0.990 0.61 (0.39, 0.95) 0.641
Folic acid intake: Q1 1 (Ref) 1 (Ref)
Folic acid intake: Q2 0.66 (0.39, 1.13) 0.802 1.59 (1.08, 2.35) 0.641
Folic acid intake: Q3 0.81 (0.45, 1.46) 0.921 1.11 (0.76, 1.60) 0.921
Folic acid intake: Q4 1.09 (0.55, 2.16) 0.921 1.47 (1.04, 2.08) 0.641
Thiamin intake: Q1 1 (Ref) 1 (Ref)
Thiamin intake: Q2 1.12 (0.55, 2.27) 0.921 0.94 (0.64, 1.39) 0.921
Thiamin intake: Q3 1.04 (0.68, 1.59) 0.942 1.16 (0.82, 1.63) 0.909
Thiamin intake: Q4 1.12 (0.49, 2.59) 0.921 1.04 (0.73, 1.49) 0.937
Riboflavin intake: Q1 1 (Ref) 1 (Ref)
Riboflavin intake: Q2 1.44 (0.79, 2.62) 0.892 0.90 (0.66, 1.23) 0.921
Riboflavin intake: Q3 0.79 (0.42, 1.49) 0.920 1.05 (0.71, 1.56) 0.921
Riboflavin intake: Q4 1.43 (0.64, 3.22) 0.905 1.00 (0.72, 1.39) 0.996
Niacin intake: Q1 1 (Ref) 1 (Ref)
Niacin intake: Q2 1.10 (0.65, 1.86) 0.921 1.26 (0.87, 1.81) 0.892
Niacin intake: Q3 1.18 (0.81, 1.73) 0.905 1.15 (0.82, 1.64) 0.909
Niacin intake: Q4 1.27 (0.68, 2.36) 0.909 1.05 (0.72, 1.55) 0.921
Iron intake: Q1 1 (Ref) 1 (Ref)
Iron intake: Q2 0.98 (0.45, 2.14) 0.981 1.30 (0.89, 1.88) 0.892
Iron intake: Q3 0.88 (0.38, 2.04) 0.921 1.23 (0.94, 1.61) 0.802
Iron intake: Q4 0.71 (0.36, 1.41) 0.905 1.06 (0.72, 1.55) 0.921
Magnesium intake: Q1 1 (Ref) 1 (Ref)
Magnesium intake: Q2 0.75 (0.45, 1.25) 0.892 0.85 (0.56, 1.27) 0.909
Magnesium intake: Q3 0.78 (0.49, 1.25) 0.905 0.91 (0.66, 1.27) 0.921
Magnesium intake: Q4 1.13 (0.53, 2.39) 0.921 0.68 (0.49, 0.95) 0.641
Zinc intake: Q1 1 (Ref) 1 (Ref)
Zinc intake: Q2 0.75 (0.41, 1.37) 0.905 1.41 (0.93, 2.13) 0.802
Zinc intake: Q3 1.03 (0.62, 1.73) 0.942 1.25 (0.80, 1.95) 0.905
Zinc intake: Q4 1.02 (0.50, 2.07) 0.981 1.28 (0.87, 1.89) 0.892
Selenium intake: Q1 1 (Ref) 1 (Ref)
Selenium intake: Q2 1.35 (0.75, 2.44) 0.905 1.02 (0.72, 1.45) 0.942
Selenium intake: Q3 1.69 (1.00, 2.87) 0.669 0.94 (0.60, 1.47) 0.921
Selenium intake: Q4 1.14 (0.59, 2.23) 0.921 1.02 (0.74, 1.42) 0.942
Fiber intake: Q1 1 (Ref) 1 (Ref)
Fiber intake: Q2 1.53 (0.69, 3.36) 0.905 0.71 (0.48, 1.04) 0.727
Fiber intake: Q3 1.21 (0.50, 2.92) 0.921 0.72 (0.50, 1.03) 0.727
Fiber intake: Q4 1.18 (0.54, 2.55) 0.921 0.66 (0.49, 0.88) 0.641
Caffeine intake: Q1 1 (Ref) 1 (Ref)
Caffeine intake: Q2 0.94 (0.42, 2.14) 0.942 0.84 (0.62, 1.13) 0.892
Caffeine intake: Q3 1.04 (0.52, 2.09) 0.942 0.84 (0.59, 1.20) 0.905
Caffeine intake: Q4 1.38 (0.59, 3.25) 0.909 0.90 (0.62, 1.29) 0.921
1 p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method

Multivariable regression analyses

Survey-weighted multivariable logistic regression models were used to assess the association between DII score and each disease outcome after adjustment for covariates of interest. Three models were generated. Model 1 adjusted for age and sex. Model 2 additionally adjusted for race/ethnicity, insurance status, poverty-income ratio, and education level. Model 3 further adjusted for BMI, physical activity, and smoking status. Results are shown in Figure 2 for asthma, allergic rhinitis, inflammatory arthritis, diabetes, and hypertension and in Figure 3 for eczema and food allergy.

Overall, there were few significant associations across the adjusted models. For asthma, no statistically significant associations were detected, although Model 3 suggested a possible, non-significant protective trend (OR 0.96, 95% CI 0.91–1.00; p=0.059). For allergic rhinitis, higher DII was associated with lower odds in Model 1 (OR 0.95, 95% CI 0.91–0.99; p=0.01), but this association was not statistically significant in Model 2 (OR 0.96, 95% CI 0.92–1.00; p=0.054) and Model 3 (OR 0.97, 95% CI 0.92–1.01; p=0.113). Among cardiometabolic outcomes, higher DII was associated with increased odds of hypertension in Model 1 (OR 1.07, 95% CI 1.03–1.12; p<0.001) and Model 2 (OR 1.06, 95% CI 1.01–1.10; p=0.011). However, this association was no longer significant in Model 3 (OR 1.03, 95% CI 0.99–1.08; p=0.149). For secondary outcomes, higher DII was associated with decreased odds of food allergy only in Model 1 (OR 0.93, 95% CI 0.89–0.98; p=0.006).

# Adjusted multivariable models----------------------------------------------------------------
## Variables for models 1, 2, 3
covariates_m1 <- c("age", "sex")
covariates_m2 <- c("age", "sex", "race", "insur", "pir", "educ")
covariates_m3 <- c("age", "sex", "race", "insur", "pir", "educ", "bmi_cat", "phys_act", "smoking")

## Helper function to build regression formulas
make_dii_formula <- function(outcome_var, covariates) {
  rhs <- c(covariates, "dii")
  as.formula(
    paste(outcome_var, "~", paste(rhs, collapse = " + "))
  )
}

## Fit models for primary outcomes

fit_primary_models <- imap(
  outcome_list,
  ~ list(
    label  = .x$label,
    design = .x$design,
    m1 = svyglm(
      make_dii_formula(.x$var, covariates_m1),
      design = .x$design,
      family = quasibinomial()
    ),
    m2 = svyglm(
      make_dii_formula(.x$var, covariates_m2),
      design = .x$design,
      family = quasibinomial()
    ),
    m3 = svyglm(
      make_dii_formula(.x$var, covariates_m3),
      design = .x$design,
      family = quasibinomial()
    )
  )
)

## Fit models for secondary outcomes

fit_secondary_models <- imap(
  outcome_list_secondary,
  ~ list(
    label  = .x$label,
    design = .x$design,
    m1 = svyglm(
      make_dii_formula(.x$var, covariates_m1),
      design = .x$design,
      family = quasibinomial()
    ),
    m2 = svyglm(
      make_dii_formula(.x$var, covariates_m2),
      design = .x$design,
      family = quasibinomial()
    ),
    m3 = svyglm(
      make_dii_formula(.x$var, covariates_m3),
      design = .x$design,
      family = quasibinomial()
    )
  )
)

# Extract adjusted multivariable associations from models 1-3----------------------------------
extract_dii_from_model <- function(fit, outcome_label, model_label) {
  td <- broom::tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
    filter(term == "dii")

  if (nrow(td) == 0) {
    return(tibble(
      outcome_lab   = outcome_label,
      model         = model_label,
      OR            = NA_real_,
      LCL           = NA_real_,
      UCL           = NA_real_,
      p             = NA_real_,
      `OR (95% CI)` = "NE",
      `p value`     = NA_character_,
      is_ne         = TRUE
    ))
  }

  OR  <- td$estimate[1]
  LCL <- td$conf.low[1]
  UCL <- td$conf.high[1]
  p   <- td$p.value[1]

  is_ne <- !is.finite(OR) | !is.finite(LCL) | !is.finite(UCL) |
    OR <= 0 | LCL <= 0 | UCL <= 0 |
    is.na(OR) | is.na(LCL) | is.na(UCL) |
    abs(log(OR)) > 10 | abs(log(LCL)) > 10 | abs(log(UCL)) > 10

  if (is_ne) {
    or_ci_str <- "NE"
    p_str     <- NA_character_
  } else {
    or_ci_str <- sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
    p_str     <- scales::pvalue(p, accuracy = 0.001)
  }

  tibble(
    outcome_lab   = outcome_label,
    model         = model_label,
    OR            = ifelse(is_ne, NA_real_, OR),
    LCL           = ifelse(is_ne, NA_real_, LCL),
    UCL           = ifelse(is_ne, NA_real_, UCL),
    p             = ifelse(is_ne, NA_real_, p),
    `OR (95% CI)` = or_ci_str,
    `p value`     = p_str,
    is_ne         = is_ne
  )
}

## Primary outcomes in adjusted models 1–3

dii_primary_models_long <- map_dfr(
  fit_primary_models,
  function(m) {
    bind_rows(
      extract_dii_from_model(m$m1, m$label, "Model 1"),
      extract_dii_from_model(m$m2, m$label, "Model 2"),
      extract_dii_from_model(m$m3, m$label, "Model 3")
    )
  }
)

## Secondary outcomes in adjusted models 1–3

dii_secondary_models_long <- map_dfr(
  fit_secondary_models,
  function(m) {
    bind_rows(
      extract_dii_from_model(m$m1, m$label, "Model 1"),
      extract_dii_from_model(m$m2, m$label, "Model 2"),
      extract_dii_from_model(m$m3, m$label, "Model 3")
    )
  }
)

# Forest plots for multivariable associations--------------------------------------------------

## Helper to construct label columns for OR and p values
add_or_p_labels <- function(df) {
  df |>
    mutate(
      or_label = if_else(`OR (95% CI)` == "NE", "NE", `OR (95% CI)`),
      p_label  = if_else(
        is.na(`p value`),
        "",
        paste0("p = ", `p value`)
      )
    )
}

## DII associations with primary outcomes
dii_primary_plot_data <- dii_primary_models_long |>
  add_or_p_labels() |>
  mutate(
    outcome_lab = factor(
      outcome_lab,
      levels = c(
        "Asthma",
        "Allergic rhinitis",
        "Inflammatory arthritis",
        "Diabetes",
        "Hypertension"
      )
    ),
    # Adjust model levels (more-adjusted model on top)
    model_label = factor(
      model,
      levels = c("Model 3", "Model 2", "Model 1")
    )
  )

## Compute axis limits based on observed CIs with padding
dii_primary_limits <- dii_primary_plot_data |>
  filter(!is_ne) |>
  summarise(
    xmin = min(LCL, na.rm = TRUE),
    xmax = max(UCL, na.rm = TRUE)
  )

x_min_primary <- pmax(0.1, dii_primary_limits$xmin * 0.9)
x_max_primary <- pmin(2.5, dii_primary_limits$xmax * 1.1)

dii_primary_forest_plot <- dii_primary_plot_data |>
  filter(!is_ne) |>
  ggplot(
    aes(
      x = OR,
      y = model_label
    )
  ) +
  geom_point() +
  geom_errorbarh(aes(xmin = LCL, xmax = UCL), height = 0.15) +
  geom_vline(xintercept = 1, linetype = "dashed") +
  geom_text(
    aes(
      label = paste0(or_label, "\n", p_label),
      x     = pmin(UCL, x_max_primary * 0.95)
    ),
    hjust = -0.05,
    size  = 3.5
  ) +
  scale_x_continuous(
    limits = c(x_min_primary, x_max_primary),
    expand = expansion(mult = c(0.02, 0.1))
  ) +
  facet_wrap(~ outcome_lab, ncol = 2) +
  labs(
    x     = "Adjusted odds ratio for Dietary Inflammatory Index",
    y     = NULL,
    title = "Figure 2: Adjusted Multivariable Associations Between Dietary Inflammatory Index\nand Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, or Hypertension"
  ) +
  theme_minimal() +
  theme(
    plot.title    = element_text(face = "bold", size = 13, hjust = 0.5),
    strip.text    = element_text(face = "bold", size = 11),
    axis.text.y   = element_text(face = "bold", size = 10),
    axis.text.x   = element_text(size = 10),
    axis.title.x  = element_text(face = "bold", size = 11),
    panel.spacing = grid::unit(0.4, "lines")
  )

## DII associations with secondary outcomes
dii_secondary_plot_data <- dii_secondary_models_long |>
  mutate(
    outcome_lab = recode(
      outcome_lab,
      "Atopic dermatitis" = "Eczema"
    )
  ) |>
  add_or_p_labels() |>
  mutate(
    outcome_lab = factor(
      outcome_lab,
      levels = c("Eczema", "Food allergy")
    ),
    model_label = factor(
      model,
      levels = c("Model 3", "Model 2", "Model 1")
    )
  )

dii_secondary_limits <- dii_secondary_plot_data |>
  filter(!is_ne) |>
  summarise(
    xmin = min(LCL, na.rm = TRUE),
    xmax = max(UCL, na.rm = TRUE)
  )

x_min_secondary <- pmax(0.1, dii_secondary_limits$xmin * 0.9)
x_max_secondary <- pmin(2.5, dii_secondary_limits$xmax * 1.1)

dii_secondary_forest_plot <- dii_secondary_plot_data |>
  filter(!is_ne) |>
  ggplot(
    aes(
      x = OR,
      y = model_label
    )
  ) +
  geom_point() +
  geom_errorbarh(aes(xmin = LCL, xmax = UCL), height = 0.15) +
  geom_vline(xintercept = 1, linetype = "dashed") +
  geom_text(
    aes(
      label = paste0(or_label, "\n", p_label),
      x     = pmin(UCL, x_max_secondary * 0.95)
    ),
    hjust = -0.05,
    size  = 3.5
  ) +
  scale_x_continuous(
    limits = c(x_min_secondary, x_max_secondary),
    expand = expansion(mult = c(0.02, 0.1))
  ) +
  facet_wrap(~ outcome_lab, ncol = 1) +
  labs(
    x       = "Adjusted odds ratio for Dietary Inflammatory Index",
    y       = NULL,
    title   = "Figure 3: Adjusted Multivariable Associations Between Dietary Inflammatory Index\nand Eczema or Food Allergy",
    caption = "Note: Model 3 was not estimatable for eczema because of sparse outcome counts."
  ) +
  theme_minimal() +
  theme(
    plot.title    = element_text(face = "bold", size = 13, hjust = 0.5),
    strip.text    = element_text(face = "bold", size = 11),
    axis.text.y   = element_text(face = "bold", size = 10),
    axis.text.x   = element_text(size = 10),
    axis.title.x  = element_text(face = "bold", size = 11),
    panel.spacing = grid::unit(0.4, "lines")
  )

# Print forest plots---------------------------------------------------------------------------
dii_primary_forest_plot

dii_secondary_forest_plot

Effect modification analyses

Effect modification analyses were performed to assess whether the association between DII and each disease outcome differed by sex, BMI category, or vitamin D status. Results are shown in Table 9 for asthma, allergic rhinitis, inflammatory arthritis, diabetes, and hypertension, and in Table 10 for eczema and food allergy. Overall, only a few interactions were statistically significant and generally without a clear trend across categories. For inflammatory arthritis, BMI significantly modified the association (p<0.001), but without a clear trend across BMI categories. For diabetes, vitamin D status significantly modified the association (p<0.001), with those with vitamin D deficiency exhibiting a higher association between DII and diabetes (OR 1.70, 95% CI 1.13–2.57), while those with vitamin D insufficiency exhibiting a reduced association (OR 0.81, 95% CI 0.65–0.99). For secondary outcomes, eczema demonstrated a significant interaction with BMI (p=0.012), with an increased association seen among those who were underweight (OR 2.20, 95% CI 1.25–3.88). No significant interaction terms were observed for sex.

# Effect modification analyses-----------------------------------------------------------------
effectmod_modifiers <- tibble(
  modifier_var   = c("sex",        "bmi_cat",        "vitD25oh_cat"),
  modifier_label = c("Sex",        "BMI category",   "Vitamin D category")
)

fit_dii_interaction_model <- function(outcome_info, covariates_base, modifier_var) {
  outcome_var <- outcome_info$var

  base_covars <- covariates_base
  if (!(modifier_var %in% base_covars)) {
    base_covars <- c(base_covars, modifier_var)
  }

  rhs_terms <- c("dii", modifier_var, paste0("dii:", modifier_var), base_covars)
  rhs_terms <- unique(rhs_terms)

  f <- as.formula(
    paste(
      outcome_var,
      "~",
      paste(rhs_terms, collapse = " + ")
    )
  )

  svyglm(
    f,
    design = outcome_info$design,
    family = quasibinomial()
  )
}

run_dii_effectmod <- function(fit, outcome_label, modifier_var, modifier_label) {
  cf <- coef(fit)
  V  <- vcov(fit)
  mf <- model.frame(fit)

  mod <- mf[[modifier_var]]
  if (!is.factor(mod)) {
    mod <- factor(mod)
  }
  levels_mod <- levels(mod)

  int_pattern1 <- paste0("^dii:", modifier_var)
  int_pattern2 <- paste0("^", modifier_var, ".*:dii")

  interaction_terms <- names(cf)[
    grepl(int_pattern1, names(cf)) | grepl(int_pattern2, names(cf))
  ]

  if (length(interaction_terms) > 0) {
    beta_int <- cf[interaction_terms]
    V_int    <- V[interaction_terms, interaction_terms, drop = FALSE]

    stat  <- as.numeric(t(beta_int) %*% solve(V_int) %*% beta_int)
    df    <- length(beta_int)
    p_int <- 1 - pchisq(stat, df = df)
  } else {
    p_int <- NA_real_
  }

  map_dfr(
    seq_along(levels_mod),
    function(j) {
      lvl <- levels_mod[j]

      if (!("dii" %in% names(cf))) {
        return(tibble())
      }

      beta_dii <- cf[["dii"]]
      var_dii  <- V["dii", "dii"]

      if (j == 1L) {
        beta <- beta_dii
        var  <- var_dii
      } else {
        int_name_candidates <- c(
          paste0("dii:", modifier_var, lvl),
          paste0(modifier_var, lvl, ":dii")
        )
        int_name <- int_name_candidates[int_name_candidates %in% names(cf)]

        if (length(int_name) == 0L) {
          return(tibble())
        }

        int_name <- int_name[1]

        beta_int <- cf[[int_name]]
        var_int  <- V[int_name, int_name]
        covar    <- V["dii", int_name]

        beta <- beta_dii + beta_int
        var  <- var_dii + var_int + 2 * covar
      }

      se <- sqrt(var)

      OR  <- exp(beta)
      LCL <- exp(beta - 1.96 * se)
      UCL <- exp(beta + 1.96 * se)

      is_ne <- !is.finite(OR) | !is.finite(LCL) | !is.finite(UCL) |
        OR <= 0 | LCL <= 0 | UCL <= 0 |
        is.na(OR) | is.na(LCL) | is.na(UCL) |
        abs(log(OR)) > 10 | abs(log(LCL)) > 10 | abs(log(UCL)) > 10

      if (is_ne) {
        or_ci_str <- "NE"
      } else {
        or_ci_str <- sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
      }

      tibble(
        outcome_lab    = outcome_label,
        modifier_var   = modifier_var,
        modifier_label = modifier_label,
        level          = lvl,
        OR             = ifelse(is_ne, NA_real_, OR),
        LCL            = ifelse(is_ne, NA_real_, LCL),
        UCL            = ifelse(is_ne, NA_real_, UCL),
        is_ne          = is_ne,
        `OR (95% CI)`  = or_ci_str,
        p_int          = p_int
      )
    }
  )
}

## Level order for BMI and Vitamin D
bmi_level_order <- c(
  "Underweight (<18.5)",
  "Normal (18.5–24.9)",
  "Overweight (25–29.9)",
  "Obesity I (30–34.9)",
  "Obesity II (35–39.9)",
  "Obesity III (≥40)"
)

vitD_level_order <- c(
  "<20 (deficient)",
  "20–29 (insufficient)",
  "≥30 (sufficient)"
)

## Helper function to build display table
build_effectmod_display <- function(df) {
  df |>
    arrange(outcome_lab, modifier_var, level) |>
    group_by(outcome_lab, modifier_var, modifier_label) |>
    group_split() |>
    map_dfr(function(grp) {
      out_name <- grp$outcome_lab[[1]]
      mod_lab  <- grp$modifier_label[[1]]

      # Fix modifier text
      mod_label_clean <- case_when(
        mod_lab == "BMI category"       ~ "by BMI category",
        mod_lab == "Vitamin D category" ~ "by Vitamin D category",
        TRUE                            ~ paste0("by ", tolower(mod_lab))
      )

      p_int_vals <- grp$`p for interaction`[!is.na(grp$`p for interaction`)]
      p_int_disp <- ifelse(length(p_int_vals) == 0, NA_character_, p_int_vals[1])

      header_row <- tibble(
        Outcome             = out_name,
        `Modifier level`    = paste0(out_name, " ", mod_label_clean),
        `OR (95% CI)`       = "",
        `p for interaction` = p_int_disp
      )

      level_rows <- grp |>
        transmute(
          Outcome             = out_name,
          `Modifier level`    = as.character(level),
          `OR (95% CI)`       = `OR (95% CI)`,
          `p for interaction` = ""
        )

      bind_rows(header_row, level_rows)
    })
}

## Effect modification for primary outcomes
effectmod_primary_long <- map_dfr(
  names(outcome_list),
  function(out_nm) {
    outcome_info <- outcome_list[[out_nm]]

    map_dfr(
      seq_len(nrow(effectmod_modifiers)),
      function(i) {
        mod_var   <- effectmod_modifiers$modifier_var[i]
        mod_label <- effectmod_modifiers$modifier_label[i]

        fit_int <- fit_dii_interaction_model(
          outcome_info    = outcome_info,
          covariates_base = covariates_m3,
          modifier_var    = mod_var
        )

        run_dii_effectmod(
          fit            = fit_int,
          outcome_label  = outcome_info$label,
          modifier_var   = mod_var,
          modifier_label = mod_label
        )
      }
    )
  }
) |>
  mutate(
    `p for interaction` = ifelse(
      is.na(p_int),
      NA_character_,
      scales::pvalue(p_int, accuracy = 0.001)
    )
  )

effectmod_primary_long <- effectmod_primary_long |>
  mutate(
    modifier_var = factor(
      modifier_var,
      levels = c("sex", "bmi_cat", "vitD25oh_cat")
    ),
    level = as.character(level)
  ) |>
  mutate(
    level = case_when(
      modifier_var == "bmi_cat"      ~ factor(level, levels = bmi_level_order),
      modifier_var == "vitD25oh_cat" ~ factor(level, levels = vitD_level_order),
      TRUE                           ~ factor(level)
    ),
    ## Enforce outcome order for primary outcomes:
    ## Asthma, Allergic rhinitis, Inflammatory arthritis, Diabetes, Hypertension
    outcome_lab = factor(
      outcome_lab,
      levels = c(
        "Asthma",
        "Allergic rhinitis",
        "Inflammatory arthritis",
        "Diabetes",
        "Hypertension"
      )
    )
  ) |>
  arrange(
    outcome_lab,
    modifier_var,
    level
  )

effectmod_primary_display <- build_effectmod_display(effectmod_primary_long)

tbl_effectmod_primary <- effectmod_primary_display |>
  select(
    Outcome,
    `Modifier level`,
    `OR (95% CI)`,
    `p for interaction`
  ) |>
  gt(groupname_col = "Outcome") |>
  tab_header(
    title = md("**Table 9: Effect Modification of the Association Between Dietary Inflammatory Index and Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension**")
  ) |>
  fmt_markdown(columns = everything()) |>
  cols_label(
    `Modifier level`    = md("**Modifier level**"),
    `OR (95% CI)`       = md("**OR (95% CI)**"),
    `p for interaction` = md("**p for interaction**")
  ) |>
  tab_options(table.font.size = px(10)) |>
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_row_groups()
  ) |>
  tab_footnote(
    footnote = "NE = not estimatable due to low event rates or unstable estimates",
    locations = cells_body(
      columns = "OR (95% CI)",
      rows    = `OR (95% CI)` == "NE"
    )
  )

## Effect modification for secondary outcomes
effectmod_secondary_long <- map_dfr(
  names(outcome_list_secondary),
  function(out_nm) {
    outcome_info <- outcome_list_secondary[[out_nm]]

    map_dfr(
      seq_len(nrow(effectmod_modifiers)),
      function(i) {
        mod_var   <- effectmod_modifiers$modifier_var[i]
        mod_label <- effectmod_modifiers$modifier_label[i]

        fit_int <- fit_dii_interaction_model(
          outcome_info    = outcome_info,
          covariates_base = covariates_m3,
          modifier_var    = mod_var
        )

        run_dii_effectmod(
          fit            = fit_int,
          outcome_label  = outcome_info$label,
          modifier_var   = mod_var,
          modifier_label = mod_label
        )
      }
    )
  }
) |>
  mutate(
    outcome_lab = recode(
      outcome_lab,
      "Atopic dermatitis" = "Eczema"
    ),
    `p for interaction` = ifelse(
      is.na(p_int),
      NA_character_,
      scales::pvalue(p_int, accuracy = 0.001)
    )
  )

effectmod_secondary_long <- effectmod_secondary_long |>
  mutate(
    modifier_var = factor(
      modifier_var,
      levels = c("sex", "bmi_cat", "vitD25oh_cat")
    ),
    level = as.character(level)
  ) |>
  mutate(
    level = case_when(
      modifier_var == "bmi_cat"      ~ factor(level, levels = bmi_level_order),
      modifier_var == "vitD25oh_cat" ~ factor(level, levels = vitD_level_order),
      TRUE                           ~ factor(level)
    ),
    ## Enforce outcome order for secondary outcomes:
    ## Eczema, Food allergy
    outcome_lab = factor(
      outcome_lab,
      levels = c(
        "Eczema",
        "Food allergy"
      )
    )
  ) |>
  arrange(
    outcome_lab,
    modifier_var,
    level
  )

effectmod_secondary_display <- build_effectmod_display(effectmod_secondary_long)

tbl_effectmod_secondary <- effectmod_secondary_display |>
  select(
    Outcome,
    `Modifier level`,
    `OR (95% CI)`,
    `p for interaction`
  ) |>
  gt(groupname_col = "Outcome") |>
  tab_header(
    title = md("**Table 10: Effect Modification of the Association Between Dietary Inflammatory Index and Eczema or Food Allergy**")
  ) |>
  fmt_markdown(columns = everything()) |>
  cols_label(
    `Modifier level`    = md("**Modifier level**"),
    `OR (95% CI)`       = md("**OR (95% CI)**"),
    `p for interaction` = md("**p for interaction**")
  ) |>
  tab_options(table.font.size = px(10)) |>
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_row_groups()
  ) |>
  tab_footnote(
    footnote = "NE = not estimatable due to low event rates or unstable estimates",
    locations = cells_body(
      columns = "OR (95% CI)",
      rows    = `OR (95% CI)` == "NE"
    )
  )

# Print effect modification tables-------------------------------------------------------------
tbl_effectmod_primary
Table 9: Effect Modification of the Association Between Dietary Inflammatory Index and Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension
Modifier level OR (95% CI) p for interaction
Asthma
Asthma by sex 0.072
Female 0.95 (0.91, 1.00)
Male 1.01 (0.96, 1.06)
Asthma by BMI category 0.808
Underweight (<18.5) 1.04 (0.73, 1.48)
Normal (18.5–24.9) 0.97 (0.91, 1.03)
Overweight (25–29.9) 0.99 (0.92, 1.07)
Obesity I (30–34.9) 0.96 (0.89, 1.04)
Obesity II (35–39.9) 1.03 (0.93, 1.14)
Obesity III (≥40) 0.97 (0.87, 1.09)
Asthma by Vitamin D category 0.842
<20 (deficient) 1.00 (0.79, 1.26)
20–29 (insufficient) 0.94 (0.81, 1.09)
≥30 (sufficient) 0.97 (0.93, 1.02)
Allergic rhinitis
Allergic rhinitis by sex 0.337
Female 0.95 (0.89, 1.00)
Male 0.99 (0.93, 1.05)
Allergic rhinitis by BMI category 0.335
Underweight (<18.5) 0.85 (0.62, 1.18)
Normal (18.5–24.9) 0.96 (0.91, 1.02)
Overweight (25–29.9) 0.98 (0.91, 1.06)
Obesity I (30–34.9) 1.00 (0.92, 1.08)
Obesity II (35–39.9) 1.00 (0.87, 1.16)
Obesity III (≥40) 0.81 (0.67, 0.97)
Allergic rhinitis by Vitamin D category 0.861
<20 (deficient) 1.02 (0.83, 1.25)
20–29 (insufficient) 0.97 (0.79, 1.18)
≥30 (sufficient) 0.96 (0.92, 1.01)
Inflammatory arthritis
Inflammatory arthritis by sex 0.263
Female 1.02 (0.88, 1.18)
Male 1.15 (0.94, 1.40)
Inflammatory arthritis by BMI category <0.001
Underweight (<18.5) NE1
Normal (18.5–24.9) 1.04 (0.89, 1.22)
Overweight (25–29.9) 1.02 (0.87, 1.18)
Obesity I (30–34.9) 1.10 (0.78, 1.55)
Obesity II (35–39.9) 1.10 (0.93, 1.31)
Obesity III (≥40) 1.12 (0.69, 1.84)
Inflammatory arthritis by Vitamin D category 0.603
<20 (deficient) 0.72 (0.23, 2.25)
20–29 (insufficient) 0.93 (0.72, 1.21)
≥30 (sufficient) 1.08 (0.92, 1.26)
Diabetes
Diabetes by sex 0.641
Female 1.04 (0.91, 1.19)
Male 0.99 (0.85, 1.15)
Diabetes by BMI category 0.994
Underweight (<18.5) 0.98 (0.87, 1.10)
Normal (18.5–24.9) 1.04 (0.72, 1.49)
Overweight (25–29.9) 1.01 (0.84, 1.21)
Obesity I (30–34.9) 1.03 (0.83, 1.27)
Obesity II (35–39.9) 0.98 (0.82, 1.16)
Obesity III (≥40) 1.01 (0.87, 1.19)
Diabetes by Vitamin D category <0.001
<20 (deficient) 1.70 (1.13, 2.57)
20–29 (insufficient) 0.81 (0.65, 0.99)
≥30 (sufficient) 1.05 (0.93, 1.18)
Hypertension
Hypertension by sex 0.285
Female 1.06 (0.99, 1.15)
Male 1.01 (0.96, 1.07)
Hypertension by BMI category 0.135
Underweight (<18.5) 1.08 (0.82, 1.42)
Normal (18.5–24.9) 1.07 (0.98, 1.16)
Overweight (25–29.9) 1.01 (0.94, 1.09)
Obesity I (30–34.9) 1.10 (1.01, 1.20)
Obesity II (35–39.9) 0.97 (0.87, 1.08)
Obesity III (≥40) 0.98 (0.87, 1.09)
Hypertension by Vitamin D category 0.047
<20 (deficient) 0.82 (0.64, 1.05)
20–29 (insufficient) 0.94 (0.82, 1.07)
≥30 (sufficient) 1.05 (1.01, 1.10)
1 NE = not estimatable due to low event rates or unstable estimates
tbl_effectmod_secondary
Table 10: Effect Modification of the Association Between Dietary Inflammatory Index and Eczema or Food Allergy
Modifier level OR (95% CI) p for interaction
Eczema
Eczema by sex 0.341
Female 1.02 (0.83, 1.26)
Male 0.90 (0.76, 1.06)
Eczema by BMI category 0.012
Underweight (<18.5) 2.20 (1.25, 3.88)
Normal (18.5–24.9) 0.95 (0.76, 1.17)
Overweight (25–29.9) 1.04 (0.83, 1.30)
Obesity I (30–34.9) 0.86 (0.69, 1.08)
Obesity II (35–39.9) 1.24 (0.73, 2.10)
Obesity III (≥40) 0.86 (0.60, 1.23)
Eczema by Vitamin D category 0.184
<20 (deficient) 0.57 (0.21, 1.54)
20–29 (insufficient) 1.22 (0.79, 1.89)
≥30 (sufficient) 0.96 (0.85, 1.09)
Food allergy
Food allergy by sex 0.322
Female 0.93 (0.87, 1.00)
Male 0.98 (0.89, 1.08)
Food allergy by BMI category 0.988
Underweight (<18.5) 0.75 (0.36, 1.56)
Normal (18.5–24.9) 0.96 (0.87, 1.07)
Overweight (25–29.9) 0.95 (0.85, 1.07)
Obesity I (30–34.9) 0.95 (0.83, 1.09)
Obesity II (35–39.9) 0.98 (0.81, 1.17)
Obesity III (≥40) 0.93 (0.78, 1.11)
Food allergy by Vitamin D category 0.313
<20 (deficient) 0.80 (0.64, 1.00)
20–29 (insufficient) 0.97 (0.84, 1.14)
≥30 (sufficient) 0.95 (0.88, 1.02)

Machine learning analyses

Exploratory machine learning models were developed to identify predictors of disease beyond those detected in traditional regression analyses. For each outcome, four algorithms – namely LR, LASSO, RF, and XGB – were trained on one of two predictor sets: (1) core demographic, psychosocial, clinical, and dietary covariates, including the total DII score, and (2) quartiles of individual nutrient intakes. ROC curves (Figures 4-5) and AUROC values (Tables 11–12) summarize model performance for models with core covariates and nutrients only, respectively. Diabetes demonstrated the strongest performance (AUROC 0.83–0.92), followed by hypertension (AUROC 0.72–0.76) and inflammatory arthritis (AUROC 0.64–0.76). Performance was more modest for allergic rhinitis (AUROC 0.62–0.66) and asthma (AUROC 0.61–0.62), and was notably lower for food allergy (AUROC 0.49–0.59) and eczema (AUROC 0.44–0.60). Across all outcomes, models trained using nutrient-intake quartiles had comparatively reduced discrimination (AUROC 0.46–0.65).

Feature importance analyses were performed to acquire insight into potentially important predictors related to disease outcomes. In core covariate models (Figures 6–12), the DII score consistently ranked among the top 10 predictors for every disease outcome. For diabetes and hypertension, important features included markers of hyperglycemia, hyperlipiedemia, and adiposity, while DII scores, cardiometabolic factors, and psychosocial variables were influential in models for allergic outcomes. In contrast, nutrient-only models (Figures 16–19) demonstrated inconsistent patterns across disease outcomes. Nutrients such as PUFA, omega-6 fats, caffeine, total caloric intake, iron, and B vitamins appeared recurrently but without disease-specific consistency.

# Model run options, outcome definitions, and helper functions---------------------------------
options(yardstick.event_first = "second") # second outcome is disease

## Trial mode. TRUE = abbreviated, trial run (3-fold CV, smaller grids) for exploratory analysis; FALSE = full run (10-fold CV, larger grids) for final analysis
trial_mode  <- FALSE
trial_max_n <- 2000  # maximum complete cases per outcome for trial runs

tidymodels_prefer()

ml_outcomes <- list(
  asthma = list(
    label      = "Asthma",
    data       = asthma_reg,
    outcome    = "asthma",
    is_primary = TRUE
  ),
  ar = list(
    label      = "Allergic rhinitis",
    data       = ar_reg,
    outcome    = "ar",
    is_primary = TRUE
  ),
  arthritis = list(
    label      = "Inflammatory arthritis",
    data       = inflam_reg,
    outcome    = "arthritis",
    is_primary = TRUE
  ),
  htn = list(
    label      = "Hypertension",
    data       = htn_reg,
    outcome    = "htn",
    is_primary = TRUE
  ),
  diabetes = list(
    label      = "Diabetes",
    data       = dm_reg,
    outcome    = "diabetes",
    is_primary = TRUE
  ),
  ad = list(
    label      = "Atopic dermatitis",
    data       = ad_reg,
    outcome    = "ad",
    is_primary = FALSE
  ),
  fa = list(
    label      = "Food allergy",
    data       = fa_reg,
    outcome    = "fa",
    is_primary = FALSE
  )
)

## Helper function to add trend variables
add_ml_trend_vars <- function(data) {
  ### Vitamin D trend
  if ("vitD25oh_cat" %in% names(data) && !"vitD25oh_cat_trend" %in% names(data)) {
    data <- data |>
      mutate(
        vitD25oh_cat_trend = if_else(
          !is.na(vitD25oh_cat),
          as.numeric(vitD25oh_cat) - 1,
          NA_real_
        )
      )
  }

  ### DII quartile trend
  if ("dii_quart" %in% names(data) && !"dii_quart_trend" %in% names(data)) {
    data <- data |>
      mutate(
        dii_quart_trend = if_else(
          !is.na(dii_quart),
          as.numeric(dii_quart) - 1,
          NA_real_
        )
      )
  }

  data
}

# Predictor sets-------------------------------------------------------------------------------
# Set 1 variabvles: core covariates
predictors_set1 <- c(
  "age",
  "sex",
  "race",
  "insur",
  "educ",
  "pir",
  "food_sec",
  "smoking",
  "phys_act",
  "alcohol",
  "sbp_cat",
  "dbp_cat",
  "bmi_cat",
  "whtr_cat",
  "hdl",
  "non_hdl",
  "tchol",
  "hba1c",
  "vitD25oh_cat",
  "vitD25oh_cat_trend",
  "dii",
  "dii_quart_trend",
  "dii_etoh"
)

# Set 2 variables: individual nutrient quartiles
predictors_set2 <- c(
  "kcal_quart", "carb_quart", "protein_quart", "totalfat_quart", "satfat_quart",
  "pufa_quart", "mufa_quart", "n3fat_quart", "n6fat_quart", "choles_quart",
  "vitA_quart", "bcarotene_quart", "vitC_quart", "vitE_quart",
  "vitB6_quart", "vitB12_quart", "folicacid_quart",
  "thiamin_quart", "riboflavin_quart", "niacin_quart",
  "iron_quart", "mg_quart", "zn_quart", "se_quart",
  "fiber_quart", "caffeine_quart"
)

# Helper functions-----------------------------------------------------------------------------
## Helper function to prepare ML data 
prepare_ml_data <- function(data, outcome, predictors) {

  ### Add trend scores
  data <- add_ml_trend_vars(data)

  ### Keep only outcome and predictors
  vars_keep <- intersect(c(outcome, predictors), names(data))

  df <- data |>
    select(all_of(vars_keep)) |>
    drop_na() |>
    mutate(
      !!outcome := factor(
        .data[[outcome]],
        levels = c(0, 1),
        labels = c("No", "Yes")
      )
    )

  ### For trial runs: sub-sample if enough events and keep all rows for rare outcomes
  if (trial_mode && nrow(df) > trial_max_n) {
    case_n <- sum(df[[outcome]] == "Yes")
    min_events_for_subsample <- 150L
    if (case_n >= min_events_for_subsample) {
      df <- slice_sample(df, n = trial_max_n)
    }
  }
  df
}

## Helper function to train/test-split data and perform cross-validation
set.seed(1234)

make_splits_and_folds <- function(df, outcome) {

  v_folds <- if (trial_mode) 3L else 10L

  split_obj <- initial_split(
    df,
    strata = !!rlang::sym(outcome),
    prop   = 0.8
  )

  train_df <- training(split_obj)
  test_df  <- testing(split_obj)

  folds <- vfold_cv(
    train_df,
    v      = v_folds,
    strata = !!rlang::sym(outcome)
  )

  list(
    split  = split_obj,
    train  = train_df,
    test   = test_df,
    folds  = folds
  )
}


## Helper function for recipes
make_ml_recipe <- function(df, outcome, predictors) {
  recipe(
    formula = stats::as.formula(paste(outcome, "~", ".")),
    data    = df
  ) |>
    step_dummy(all_nominal_predictors()) |>
    step_zv(all_predictors()) |>
    step_normalize(all_predictors())
}


# Model specifications-------------------------------------------------------------------------
lr_spec <-
  logistic_reg() |>
  set_engine("glm") |>
  set_mode("classification")

lasso_spec <-
  logistic_reg(
    penalty = tune(),
    mixture = 1
  ) |>
  set_engine("glmnet") |>
  set_mode("classification")

rf_spec_base <-
  rand_forest(
    trees = if (trial_mode) 500L else 1000L,
    mtry  = tune(),
    min_n = 5L
  ) |>
  set_engine("randomForest", importance = TRUE) |>
  set_mode("classification")

xgb_spec_base <-
  boost_tree(
    trees          = tune(),
    tree_depth     = tune(),
    learn_rate     = tune(),
    min_n          = 5L,
    sample_size    = 1.0,
    loss_reduction = 0
  ) |>
  set_mode("classification") |>
  set_engine("xgboost")


## Helper function for logistic regression metrics and variable importance
get_lr_vi <- function(lr_fit_workflow) {
  glm_fit <- extract_fit_engine(lr_fit_workflow)
  broom::tidy(glm_fit) |>
    filter(term != "(Intercept)") |>
    transmute(
      Variable   = term,
      Importance = abs(estimate),
      model      = "Logistic regression"
    )
}


# Machine learning pipeline -------------------------------------------------------------------
run_ml_pipeline <- function(predictors, predictor_set_label) {
  ## Datasets and sample sizes
  ml_data_list <- purrr::imap(
    ml_outcomes,
    ~ prepare_ml_data(
      data       = .x$data,
      outcome    = .x$outcome,
      predictors = predictors
    )
  )

  ml_sample_sizes <- purrr::imap_dfr(
    ml_data_list,
    ~ tibble(
      outcome        = .y,
      outcome_label  = ml_outcomes[[.y]]$label,
      N_ml           = nrow(.x),
      predictor_set  = predictor_set_label
    )
  )

  ml_sample_sizes_gt <- ml_sample_sizes |>
    gt() |>
    tab_header(
      title = paste0("ML analytic sample sizes by outcome — ", predictor_set_label)
    ) |>
    cols_label(
      outcome       = "Outcome (internal name)",
      outcome_label = "Outcome",
      N_ml          = "Complete-case N (ML)",
      predictor_set = "Predictor set"
    )

  ## Train/test splits and cross-validation folds
  ml_resampling <- purrr::imap(
    ml_data_list,
    ~ make_splits_and_folds(.x, outcome = ml_outcomes[[.y]]$outcome)
  )

  ## Recipes and feature counts
  ml_recipes <- purrr::imap(
    ml_resampling,
    ~ make_ml_recipe(
      df         = .x$train,
      outcome    = ml_outcomes[[.y]]$outcome,
      predictors = predictors
    )
  )

  ml_feature_counts <- purrr::imap_dfr(
    ml_recipes,
    ~ {
      prep_rec <- prep(.x)
      baked    <- juice(prep_rec)
      tibble(
        outcome        = .y,
        outcome_label  = ml_outcomes[[.y]]$label,
        n_features     = ncol(baked) - 1,
        predictor_set  = predictor_set_label
      )
    }
  )

  ml_feature_counts_gt <- ml_feature_counts |>
    gt() |>
    tab_header(
      title = paste0("Number of ML predictors after preprocessing — ", predictor_set_label)
    ) |>
    cols_label(
      outcome       = "Outcome (internal name)",
      outcome_label = "Outcome",
      n_features    = "Number of predictors",
      predictor_set = "Predictor set"
    )

  ## Tuning grids
  lasso_grid <- grid_regular(
    penalty(),
    levels = if (trial_mode) 5L else 30L
  )

  rf_grid <- grid_regular(
    mtry(range = c(5L, min(30L, length(predictors)))),
    levels = if (trial_mode) 3L else 5L
  )

  set.seed(1234)
  xgb_grid <- grid_latin_hypercube(
    trees(range = c(100L, 300L)),
    tree_depth(range = c(3L, 5L)),
    learn_rate(range = c(-2, -1)),
    size = if (trial_mode) 10L else 20L
  )

  ## Workflows per outcome
  make_workflows_for_outcome <- function(recipe_obj, outcome_name, train_df) {
    list(
      lr = workflow() |>
        add_model(lr_spec) |>
        add_recipe(recipe_obj),

      lasso = workflow() |>
        add_model(lasso_spec) |>
        add_recipe(recipe_obj),

      rf = workflow() |>
        add_model(rf_spec_base) |>
        add_recipe(recipe_obj),

      xgb = workflow() |>
        add_model(xgb_spec_base) |>
        add_recipe(recipe_obj)
    )
  }

  ml_workflows <- purrr::imap(
    ml_resampling,
    ~ make_workflows_for_outcome(
      recipe_obj   = ml_recipes[[.y]],
      outcome_name = .y,
      train_df     = .x$train
    )
  )

  ctrl_resamples <- control_grid(
    save_pred     = TRUE,
    parallel_over = "resamples",
    verbose       = TRUE
  )

  ## Model tuning and cross-validation for each outcome
  tune_models_for_outcome <- function(outcome_name) {
    resampling    <- ml_resampling[[outcome_name]]
    workflows_out <- ml_workflows[[outcome_name]]

    lr_res <- workflows_out$lr |>
      fit_resamples(
        resamples = resampling$folds,
        control   = control_resamples(save_pred = TRUE),
        metrics   = metric_set(
          roc_auc
        )
      )

    lasso_res <- workflows_out$lasso |>
      tune_grid(
        resamples = resampling$folds,
        grid      = lasso_grid,
        control   = ctrl_resamples,
        metrics   = metric_set(roc_auc)
      )

    rf_res <- workflows_out$rf |>
      tune_grid(
        resamples = resampling$folds,
        grid      = rf_grid,
        control   = ctrl_resamples,
        metrics   = metric_set(roc_auc)
      )

    xgb_res <- workflows_out$xgb |>
      tune_grid(
        resamples = resampling$folds,
        grid      = xgb_grid,
        control   = ctrl_resamples,
        metrics   = metric_set(roc_auc)
      )

    list(
      lr    = lr_res,
      lasso = lasso_res,
      rf    = rf_res,
      xgb   = xgb_res
    )
  }

  set.seed(1234)
  ml_tuning_results <- purrr::imap(
    ml_outcomes,
    ~ tune_models_for_outcome(.y)
  )

  ## Best hyperparameters
  get_best_params <- function(tune_res) {
    list(
      lasso = select_best(tune_res$lasso, metric = "roc_auc"),
      rf    = select_best(tune_res$rf,    metric = "roc_auc"),
      xgb   = select_best(tune_res$xgb,   metric = "roc_auc")
    )
  }

  ml_best_params <- purrr::imap(
    ml_tuning_results,
    ~ get_best_params(.x)
  )

  ## Final model fits and test-set performance
  fit_and_evaluate_final_models <- function(outcome_name) {
    outcome_info  <- ml_outcomes[[outcome_name]]
    resampling    <- ml_resampling[[outcome_name]]
    workflows_out <- ml_workflows[[outcome_name]]
    best_pars     <- ml_best_params[[outcome_name]]
    outcome_var   <- outcome_info$outcome

    train_df <- resampling$train
    test_df  <- resampling$test

    ### Logistic regression
    final_lr_fit <- workflows_out$lr |>
      fit(data = train_df)

    lr_preds_test <- bind_cols(
      truth = test_df[[outcome_var]],
      predict(final_lr_fit, test_df),
      predict(final_lr_fit, test_df, type = "prob")
    )

    lr_metrics <- roc_auc(
      lr_preds_test,
      truth,
      .pred_Yes,
      event_level = "second"
    ) |>
      mutate(
        outcome          = outcome_info$label,
        model            = "Logistic regression",
        set              = "Full predictors",
        outcome_internal = outcome_name,
        predictor_set    = predictor_set_label
      )

    ### LASSO
    final_lasso_wf <- workflows_out$lasso |>
      finalize_workflow(best_pars$lasso)

    final_lasso_fit <- final_lasso_wf |>
      fit(data = train_df)

    lasso_preds_test <- bind_cols(
      truth = test_df[[outcome_var]],
      predict(final_lasso_fit, test_df),
      predict(final_lasso_fit, test_df, type = "prob")
    )

    lasso_metrics <- roc_auc(
      lasso_preds_test,
      truth,
      .pred_Yes,
      event_level = "second"
    ) |>
      mutate(
        outcome          = outcome_info$label,
        model            = "LASSO",
        set              = "Full predictors",
        outcome_internal = outcome_name,
        predictor_set    = predictor_set_label
      )

    ### Random forest
    final_rf_wf <- workflows_out$rf |>
      finalize_workflow(best_pars$rf)

    final_rf_fit <- final_rf_wf |>
      fit(data = train_df)

    rf_preds_test <- bind_cols(
      truth = test_df[[outcome_var]],
      predict(final_rf_fit, test_df),
      predict(final_rf_fit, test_df, type = "prob")
    )

    rf_metrics <- roc_auc(
      rf_preds_test,
      truth,
      .pred_Yes,
      event_level = "second"
    ) |>
      mutate(
        outcome          = outcome_info$label,
        model            = "Random forest",
        set              = "Full predictors",
        outcome_internal = outcome_name,
        predictor_set    = predictor_set_label
      )

    ### XGBoost
    final_xgb_wf <- workflows_out$xgb |>
      finalize_workflow(best_pars$xgb)

    final_xgb_fit <- final_xgb_wf |>
      fit(data = train_df)

    xgb_preds_test <- bind_cols(
      truth = test_df[[outcome_var]],
      predict(final_xgb_fit, test_df),
      predict(final_xgb_fit, test_df, type = "prob")
    )

    xgb_metrics <- roc_auc(
      xgb_preds_test,
      truth,
      .pred_Yes,
      event_level = "second"
    ) |>
      mutate(
        outcome          = outcome_info$label,
        model            = "XGBoost",
        set              = "Full predictors",
        outcome_internal = outcome_name,
        predictor_set    = predictor_set_label
      )

    list(
      metrics = bind_rows(
        lr_metrics,
        lasso_metrics,
        rf_metrics,
        xgb_metrics
      ),
      preds = list(
        lr    = lr_preds_test,
        lasso = lasso_preds_test,
        rf    = rf_preds_test,
        xgb   = xgb_preds_test
      ),
      final_fits = list(
        lr    = final_lr_fit,
        lasso = final_lasso_fit,
        rf    = final_rf_fit,
        xgb   = final_xgb_fit
      )
    )
  }

  ml_final_results <- purrr::imap(
    ml_outcomes,
    ~ fit_and_evaluate_final_models(.y)
  )

  ## Performance summary data
  ml_performance_summary_long <- purrr::map_dfr(
    names(ml_final_results),
    ~ ml_final_results[[.x]]$metrics
  )

  ml_performance_table <- ml_performance_summary_long |>
    select(
      predictor_set,
      outcome_internal,
      outcome,
      model,
      set,
      .metric,
      .estimate
    ) |>
    pivot_wider(
      id_cols     = c(predictor_set, outcome_internal, outcome, model, set),
      names_from  = .metric,
      values_from = .estimate
    ) |>
    rename(
      AUROC = roc_auc
    ) |>
    arrange(outcome, desc(AUROC))

  ml_performance_table_gt <- ml_performance_table |>
    gt(rowname_col = "model", groupname_col = "outcome") |>
    tab_header(
      title = paste0("Test-set performance of ML models by outcome — ",
                     predictor_set_label)
    ) |>
    fmt_number(
      columns = c(AUROC),
      decimals = 3
    ) |>
    cols_label(
      predictor_set    = "Predictor set",
      outcome_internal = "Outcome (internal)",
      outcome          = "Outcome",
      set              = "Predictor usage",
      AUROC            = "AUROC"
    )

  ## Variable importance data
  get_vi_all_models_for_outcome <- function(outcome_name) {
    fits <- ml_final_results[[outcome_name]]$final_fits

    lr_vi <- get_lr_vi(fits$lr)

    lasso_vi <- vip::vi(fits$lasso) |>
      mutate(
        model    = "LASSO",
        Variable = as.character(Variable)
      )

    rf_vi <- vip::vi(fits$rf) |>
      mutate(
        model    = "Random forest",
        Variable = as.character(Variable)
      )

    xgb_vi <- vip::vi(fits$xgb) |>
      mutate(
        model    = "XGBoost",
        Variable = as.character(Variable)
      )

    bind_rows(lr_vi, lasso_vi, rf_vi, xgb_vi)
  }

  ml_vi_all <- purrr::imap(
    ml_outcomes,
    ~ get_vi_all_models_for_outcome(.y)
  )

  ## Pipeline outputs
  list(
    predictor_set_label      = predictor_set_label,
    predictors               = predictors,
    ml_sample_sizes          = ml_sample_sizes,
    ml_sample_sizes_gt       = ml_sample_sizes_gt,
    ml_feature_counts        = ml_feature_counts,
    ml_feature_counts_gt     = ml_feature_counts_gt,
    ml_data_list             = ml_data_list,
    ml_resampling            = ml_resampling,
    ml_recipes               = ml_recipes,
    ml_tuning_results        = ml_tuning_results,
    ml_best_params           = ml_best_params,
    ml_final_results         = ml_final_results,
    ml_performance_table     = ml_performance_table,
    ml_performance_table_gt  = ml_performance_table_gt,
    ml_vi_all                = ml_vi_all
  )
}

# Run machine learning pipelines for set 1 (core covariates) and set 2 (individual nutritional covariates)------------------------------------------------------------------------------------
results_set1 <- run_ml_pipeline(
  predictors          = predictors_set1,
  predictor_set_label = "Set 1: main regression predictors"
)

results_set2 <- run_ml_pipeline(
  predictors          = predictors_set2,
  predictor_set_label = "Set 2: nutrient quartiles"
)

ml_performance_all_stage1 <- bind_rows(
  results_set1$ml_performance_table,
  results_set2$ml_performance_table
)

# ROC curves-----------------------------------------------------------------------------------
## ROC curves for all outcomes by predictors
make_roc_panel <- function(results_obj, title_text) {

  desired_order <- c(
    "Asthma",
    "Allergic rhinitis",
    "Inflammatory\narthritis",
    "Diabetes",
    "Hypertension",
    "Eczema",
    "Food allergy"
  )

  outcome_label_map <- c(
    asthma    = "Asthma",
    ar        = "Allergic rhinitis",
    arthritis = "Inflammatory\narthritis",
    diabetes  = "Diabetes",
    htn       = "Hypertension",
    ad        = "Eczema",
    fa        = "Food allergy"
  )

  ### Model color mapping
  custom_colors <- c(
    "LR"    = "#E69F00",  # logistic regression
    "Lasso" = "#009E73",  # LASSO
    "RF"    = "#0072B2",  # random forest
    "XGB"   = "#CC79A7"   # XGBoost
  )

  outcome_names <- names(ml_outcomes)

  roc_long <- purrr::map_dfr(
    outcome_names,
    function(out_nm) {
      preds         <- results_obj$ml_final_results[[out_nm]]$preds
      outcome_label <- outcome_label_map[[out_nm]]

      bind_rows(
        roc_curve(preds$lr,    truth, .pred_Yes, event_level = "second") |>
          mutate(model = "LR"),
        roc_curve(preds$lasso, truth, .pred_Yes, event_level = "second") |>
          mutate(model = "Lasso"),
        roc_curve(preds$rf,    truth, .pred_Yes, event_level = "second") |>
          mutate(model = "RF"),
        roc_curve(preds$xgb,   truth, .pred_Yes, event_level = "second") |>
          mutate(model = "XGB")
      ) |>
        mutate(
          outcome = factor(outcome_label, levels = desired_order)
        )
    }
  )

  ggplot(
    roc_long,
    aes(x = 1 - specificity, y = sensitivity, color = model)
  ) +
    geom_line(linewidth = 0.9) +
    geom_abline(linetype = "dashed") +
    coord_equal() +
    facet_wrap(~ outcome, ncol = 5, drop = FALSE) +
    labs(
      title = title_text,
      x     = "1 − Specificity",
      y     = "Sensitivity",
      color = "Model"
    ) +
    scale_color_manual(
      values = custom_colors,
      breaks = c("LR", "Lasso", "RF", "XGB")
    ) +
    theme_minimal(base_size = 12) +
    theme(
      plot.title   = element_text(size = 14, face = "bold", hjust = 0.5),
      axis.title.x = element_text(size = 12, face = "bold"),
      axis.title.y = element_text(size = 12, face = "bold"),
      legend.position = "bottom",
      legend.title    = element_text(size = 12, face = "bold"),
      legend.text     = element_text(size = 12),
      strip.text      = element_text(size = 12, face = "bold"),
      panel.spacing.x = grid::unit(1.2, "lines"),
      panel.spacing.y = grid::unit(1.2, "lines")
    )
}

## ROC curves for models with core covariates
roc_panel_set1 <- make_roc_panel(
  results_set1,
  "Figure 4: Receiver Operating Characteristic Curves\nfor Machine Learning Models With Core Covariates"
)

## ROC curves for models with individual nutritional covariates
roc_panel_set2 <- make_roc_panel(
  results_set2,
  "Figure 5: Receiver Operating Characteristic Curves\nfor Machine Learning Models With Individual Nutritional Covariates"
)

# Machine learning model performance summary tables -------------------------------------------
## Desired outcome order
desired_outcome_order <- c(
  "Asthma",
  "Allergic rhinitis",
  "Inflammatory arthritis",
  "Diabetes",
  "Hypertension",
  "Eczema",
  "Food allergy"
)

## Core covariates performance summary (set 1)
core_perf <- results_set1$ml_performance_table |>
  mutate(
    outcome = recode(outcome, "Atopic dermatitis" = "Eczema"),
    outcome = factor(outcome, levels = desired_outcome_order),
    model   = factor(
      model,
      levels = c(
        "Logistic regression",
        "LASSO",
        "Random forest",
        "XGBoost"
      )
    )
  ) |>
  arrange(outcome, model)

core_perf_gt <- core_perf |>
  select(
    outcome,
    model,
    AUROC
  ) |>
  gt(rowname_col = "model", groupname_col = "outcome") |>
  tab_header(
    title = md("**Table 11: Machine Learning Model Performance Summary for Models With Core Covariates**")
  ) |>
  fmt_number(
    columns = c(AUROC),
    decimals = 3
  ) |>
  cols_label(
    outcome = md("**Disease**"),
    model   = md("**Model**"),
    AUROC   = md("**AUROC**")
  ) |>
  tab_options(
    table.width = pct(70)
  ) |>
  tab_style(
    style = list(
      cell_text(weight = "bold")
    ),
    locations = list(
      cells_title("title"),
      cells_column_labels(everything()),
      cells_row_groups()
    )
  )

## Individual nutritional covariates performance summary (set 2)
nutr_perf <- results_set2$ml_performance_table |>
  mutate(
    outcome = recode(outcome, "Atopic dermatitis" = "Eczema"),
    outcome = factor(outcome, levels = desired_outcome_order),
    model   = factor(
      model,
      levels = c(
        "Logistic regression",
        "LASSO",
        "Random forest",
        "XGBoost"
      )
    )
  ) |>
  arrange(outcome, model)

nutr_perf_gt <- nutr_perf |>
  select(
    outcome,
    model,
    AUROC
  ) |>
  gt(rowname_col = "model", groupname_col = "outcome") |>
  tab_header(
    title = md("**Table 12: Machine Learning Model Performance Summary for Models With Individual Nutritional Covariates**")
  ) |>
  fmt_number(
    columns = c(AUROC),
    decimals = 3
  ) |>
  cols_label(
    outcome = md("**Disease**"),
    model   = md("**Model**"),
    AUROC   = md("**AUROC**")
  ) |>
  tab_options(
    table.width = pct(70)
  ) |>
  tab_style(
    style = list(
      cell_text(weight = "bold")
    ),
    locations = list(
      cells_title("title"),
      cells_column_labels(everything()),
      cells_row_groups()
    )
  )


# Settings for ranked variable importance figures----------------------------------------------
## Disease-specific color map
disease_color_map <- c(
  "Asthma"                 = "#5E3C99",
  "Allergic Rhinitis"      = "#89CFF0",
  "Inflammatory Arthritis" = "#009E73",
  "Diabetes"               = "#F0E442",
  "Hypertension"           = "#D55E00",
  "Eczema"                 = "#E78AC3",
  "Food Allergy"           = "#4B2E0F"
)

make_vi_plot_single <- function(results_obj,
                                outcome_name,
                                disease_title,
                                predictor_set_desc,
                                figure_label,
                                top_n = 10) {

  vi_df <- results_obj$ml_vi_all[[outcome_name]]

  vi_df <- vi_df |>
    filter(!is.na(Importance), Importance != 0)

  vi_df <- vi_df |>
    mutate(
      model = factor(
        model,
        levels = c(
          "Logistic regression",
          "LASSO",
          "Random forest",
          "XGBoost"
        )
      )
    )

  vi_plot_df <- vi_df |>
    group_by(model) |>
    slice_max(order_by = Importance, n = top_n, with_ties = FALSE) |>
    arrange(Importance, .by_group = TRUE) |>
    mutate(
      Variable_plot = paste(model, Variable, sep = "__"),
      Variable_plot = factor(Variable_plot, levels = unique(Variable_plot))
    ) |>
    ungroup()

  ggplot(
    vi_plot_df,
    aes(x = Importance, y = Variable_plot)
  ) +
    geom_col(fill = disease_color_map[disease_title]) +
    facet_wrap(~ model, scales = "free") +
    scale_y_discrete(
      labels = function(x) sub("^[^_]+__", "", x)
    ) +
    labs(
      title = paste0(
        figure_label,
        ": Ranked Feature Importance of ",
        predictor_set_desc,
        "\nin Machine Learning Models for ",
        disease_title
      ),
      x = "Importance",
      y = "Predictor"
    ) +
    theme_minimal(base_size = 12) +
    theme(
      plot.title   = element_text(size = 14, face = "bold", hjust = 0.5),
      axis.title.x = element_text(size = 12, face = "bold"),
      axis.title.y = element_text(size = 12, face = "bold"),
      strip.text   = element_text(size = 11, face = "bold"),
      panel.spacing = grid::unit(1, "lines")
    )
}

disease_title_map <- c(
  asthma    = "Asthma",
  ar        = "Allergic Rhinitis",
  arthritis = "Inflammatory Arthritis",
  diabetes  = "Diabetes",
  htn       = "Hypertension",
  ad        = "Eczema",
  fa        = "Food Allergy"
)

core_figure_labels <- c(
  asthma    = "Figure 6",
  ar        = "Figure 7",
  arthritis = "Figure 8",
  diabetes  = "Figure 9",
  htn       = "Figure 10",
  ad        = "Figure 11",
  fa        = "Figure 12"
)

nutr_figure_labels <- c(
  asthma    = "Figure 13",
  ar        = "Figure 14",
  arthritis = "Figure 15",
  diabetes  = "Figure 16",
  htn       = "Figure 17",
  ad        = "Figure 18",
  fa        = "Figure 19"
)

# Generate core covariate (set 1) variable importance plots------------------------------------
vi_core_plots <- purrr::imap(
  disease_title_map,
  ~ make_vi_plot_single(
      results_obj        = results_set1,
      outcome_name       = .y,
      disease_title      = .x,
      predictor_set_desc = "Core Covariates",
      figure_label       = core_figure_labels[[.y]]
    )
)

vi_core_asthma       <- vi_core_plots$asthma      # Figure 6
vi_core_ar           <- vi_core_plots$ar          # Figure 7
vi_core_arthritis    <- vi_core_plots$arthritis   # Figure 8
vi_core_diabetes     <- vi_core_plots$diabetes    # Figure 9
vi_core_htn          <- vi_core_plots$htn         # Figure 10
vi_core_eczema       <- vi_core_plots$ad          # Figure 11
vi_core_food_allergy <- vi_core_plots$fa          # Figure 12

# Generate individual nutritional covariate (set 2) variable importance plots -----------------
vi_nutr_plots <- purrr::imap(
  disease_title_map,
  ~ make_vi_plot_single(
      results_obj        = results_set2,
      outcome_name       = .y,
      disease_title      = .x,
      predictor_set_desc = "Individual Nutritional Covariates",
      figure_label       = nutr_figure_labels[[.y]]
    )
)

vi_nutr_asthma       <- vi_nutr_plots$asthma      # Figure 13
vi_nutr_ar           <- vi_nutr_plots$ar          # Figure 14
vi_nutr_arthritis    <- vi_nutr_plots$arthritis   # Figure 15
vi_nutr_diabetes     <- vi_nutr_plots$diabetes    # Figure 16
vi_nutr_htn          <- vi_nutr_plots$htn         # Figure 17
vi_nutr_eczema       <- vi_nutr_plots$ad          # Figure 18
vi_nutr_food_allergy <- vi_nutr_plots$fa          # Figure 19

# Print ML tables and figures------------------------------------------------------------------
## Core covariates performance summary
core_perf_gt
Table 11: Machine Learning Model Performance Summary for Models With Core Covariates
AUROC
Asthma
Logistic regression 0.612
LASSO 0.607
Random forest 0.608
XGBoost 0.621
Allergic rhinitis
Logistic regression 0.658
LASSO 0.654
Random forest 0.618
XGBoost 0.659
Inflammatory arthritis
Logistic regression 0.709
LASSO 0.637
Random forest 0.762
XGBoost 0.704
Diabetes
Logistic regression 0.827
LASSO 0.921
Random forest 0.831
XGBoost 0.894
Hypertension
Logistic regression 0.760
LASSO 0.764
Random forest 0.720
XGBoost 0.732
Eczema
Logistic regression 0.439
LASSO 0.449
Random forest 0.601
XGBoost 0.533
Food allergy
Logistic regression 0.575
LASSO 0.589
Random forest 0.486
XGBoost 0.495
## Individual nutritional covariates performance summary
nutr_perf_gt
Table 12: Machine Learning Model Performance Summary for Models With Individual Nutritional Covariates
AUROC
Asthma
Logistic regression 0.527
LASSO 0.517
Random forest 0.462
XGBoost 0.499
Allergic rhinitis
Logistic regression 0.540
LASSO 0.550
Random forest 0.517
XGBoost 0.542
Inflammatory arthritis
Logistic regression 0.652
LASSO 0.516
Random forest 0.463
XGBoost 0.508
Diabetes
Logistic regression 0.594
LASSO 0.594
Random forest 0.599
XGBoost 0.515
Hypertension
Logistic regression 0.552
LASSO 0.551
Random forest 0.536
XGBoost 0.531
Eczema
Logistic regression 0.475
LASSO 0.500
Random forest 0.521
XGBoost 0.544
Food allergy
Logistic regression 0.510
LASSO 0.509
Random forest 0.543
XGBoost 0.520
## ROC panels
print(roc_panel_set1)

print(roc_panel_set2)

## Core covariates variable importance
print(vi_core_asthma)

print(vi_core_ar)

print(vi_core_arthritis)

print(vi_core_diabetes)

print(vi_core_htn)

print(vi_core_eczema)

print(vi_core_food_allergy)

## Individual nutritional covariates variable importance
print(vi_nutr_asthma)

print(vi_nutr_ar)

print(vi_nutr_arthritis)

print(vi_nutr_diabetes)

print(vi_nutr_htn)

print(vi_nutr_eczema)

print(vi_nutr_food_allergy)

5 Conclusion

In this nationally representative cohort of U.S. adults aged 20–40 years, I evaluated whether dietary inflammatory potential, measured via the Dietary Inflammatory Index, was associated with allergic, immunologic, and cardiometabolic outcomes. This was performed using both survey-weighted regression and exploratory machine learning analyses. In regression analyses, signals for higher allergic, immunologic, and/or cardiometabolic disease burden and pro-inflammatory diet were detected, but these associations largely attenuated after adjustment for sociodemographic, behavioral, and anthropometric factors. Nutrient-specific analyses identified a few potentially protective micro- and macronutrients, though most associations were modest and not consistently significant. Interestingly, in machine learning models, DII consistently ranked among the top predictors in models that included core covariates, including sociodemographic and clinical variables, although machine learning model findings should be interpreted with caution given they were not robust in their predictive performance. Overall while DII may not independently predict diseases, it may play an incremental role in driving these diseases.

This study has several strengths, including use of nationally representative NHANES data, implementaiton of survey weighting, inclusion of a validated dietary inflammatory score and nutrient composition database, and the use of complementary analytic frameworks to capture both adjusted associations and potentially important predictors in the context of machine learning analyses. However, several important limitations must be acknowledged. First, the study’s cross-sectional design precludes causal inference. Second, the accuracy of self-reported medical diagnoses and 24-hour dietary recalls may be inaccurate. Moreover, self-reported eczema and food allergy were only available in restricted NHANES cycles, a reflection of survey cycle heterogeneity and shifting survey priorities. As analyses relied on complete cases, the potential impact of missing data on the results is unclear. For the machine learning models, several outcomes yielded AUROC values near 0.5, indicating limited predictive performance ability and, in turn, limiting the interpretation of the importance of topmost predictors. In addition, the cross-sectional nature of the study limited the ability of these models to capture temporal patterns.

Future directions for this work include examining broader age ranges and a wider set of candidate diseases to better characterize how dietary inflammatory potential relates to risk across the lifespan. Longitudinal cohort studies, in which dietary patterns and clinical outcomes are tracked over time, would offer a complementary approach for addressing these questions and could be paired with multi-omic data, such as immune and metabolic biomarkers and microbiome profiles. In addition, future machine learning models could focus on enhancing model performance by identifying a more predictive set of features not considered in this analysis. Continued study and refinement of disease risk models that integrate dietary information may reveal insights to inform prevention and management strategies for chronic immune-mediated and metabolic diseases.

6 References

[1] Annesi-Maesano I, Fleddermann M, Hornef M, et al. Allergic diseases in infancy: Epidemiology and current interpretation. World Allergy Organ J. 2021;14(11):100591.

[2] Boddu SK, Giannini C, Marcovecchio ML. Metabolic disorders in young people around the world. Diabetologia.2025;68(11):2374–2385.

[3] Kim H, Sitarik AR, Woodcroft K, Johnson CC, Zoratti E. Birth mode, breastfeeding, pet exposure, and antibiotics: Associations with the gut microbiome and sensitization. Curr Allergy Asthma Rep. 2019;19(4):22.

[4] Sbihi H, Boutin RC, Cutler C, Suen M, Finlay BB, Turvey SE. How early-life environmental exposures shape the gut microbiome and influence allergic disease. Allergy. 2019;74(11):2103–2115.

[5] Kuniyoshi Y, Tsujimoto Y, Banno M, et al. Association of obesity or metabolic syndrome with allergic diseases: Overview of reviews. Obes Rev. 2025;26(3):e13862.

[6] Chang CL, Ali GB, Pham J, et al. Childhood BMI trajectories and asthma and allergies: A systematic review. Clin Exp Allergy. 2023;53(9):911–929.

[7] Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Data, 2005–2012. Hyattsville, MD: US Department of Health and Human Services. Available from: https://wwwn.cdc.gov/nchs/nhanes/

[8] Xu Z. Improving the dietaryindex R package: A proposal to include additional components for more accurate DII computation in NHANES. Am J Clin Nutr. 2025 Jan;121(1):174–175. Epub 2024 Nov 9.

[9] Zhan JJ, Hodge RA, Dunlop AL, Lee MM, Bui L, Liang D, Ferranti EP. Dietaryindex: a user-friendly and versatile R package for standardizing dietary pattern analysis in epidemiological and clinical studies. Am J Clin Nutr. 2024 Nov;120(5):1165–1174. Epub 2024 Aug 23.

[10] Qing L, Zhu Y, Yu C, Zhang Y, Ni J. Exploring the association between dietary Inflammatory Index and chronic pain in US adults using NHANES 1999-2004. Sci Rep. 2024 Apr 16;14(1):8726.

7 Acknowledgements

I would like to thank Drs. Robert Grundmeier and Rich Tsui for their guidance on the analytic and methodological approaches used in this study. I am also grateful to the BMIN 5030 professors and teaching assistants for helping me expand my foundation in data analysis throughout the course. In addition to consulting with my mentors, several references and tools were utilized to aid in completing this project, including prior course lecture slides, practicum exercises, and online troubleshooting resources including Stack Overflow, Posit Forum, and ChatGPT (OpenAI). ChatGPT was used for selective assistance with code debugging and for designing template helper functions to automate repetitive elements of data processing. All analytic decisions, data processing, modeling choices, and final code were conceptualized, implemented, and reviewed by me with input from my co-mentors. I also wish to acknowledge the dietaryindexNDP package, which enabled calculation of DII scores that were key to this analysis.